Charts: stratified random samplin base on a target #490

reza1615 · 2021-05-21T18:34:06Z

Now the random sampling for charts doest apply stratified
sampling based on target.
It would be helpful to have a drop down to set the target column and generate stratified samples based on that.
It will help to have the same distribution of the whole data set.

harshithakolipaka · 2023-03-05T09:40:02Z

hey, I would like to work on this issue. Can you help me direct to the source code so that I can add the feature

aschonfeld · 2023-03-06T14:40:47Z

@harshithakolipaka thanks for your interest in this feature! Ok so to give you some background on how this will work you'll have to update the run_query function to have a boolean named parameter stratified_random_sample. So it will look something like this

run_query(
          handle_predefined(data_id),
          query,
          global_state.get_context_variables(data_id),
          pct=inputs.get("load"),
          pct_type=inputs.get("load_type"),
          stratified_random_sample=True
)

The run_query function is located here

The idea is that you'll want to check this boolean parameter and if its True then you'll run your random sampling code. You'll want to run it on the df parameter passed into run_query, it's the dataframe you've loaded into D-Tale. You'll probably want to call that function before anything else in run_query. Lastly, your random sampling function will return a dataframe that will later be used in the rest of the run_query function.

Honestly, if you'd just like to provide the random sampling function I can plug in the rest and make sure you still get credit for the code you've committed. Let me know if you have any other questions. Thanks

aschonfeld · 2023-04-10T17:46:25Z

@reza1615 I'm looking to implement this but wanted to know which of these two solutions were more suitable:

https://stackoverflow.com/a/44115314
https://stackoverflow.com/a/74493296 (and if this one what should the value of sampling_rate be set to and would an input be made available on the UI for this?)

reza1615 · 2023-04-10T19:48:55Z

@reza1615 I'm looking to implement this but wanted to know which of these two solutions were more suitable:

https://stackoverflow.com/a/44115314

https://stackoverflow.com/a/74493296 (and if this one what should the value of sampling_rate be set to and would an input be made available on the UI for this?)

Hi, the second one. The sampling_rate is the ratio that we choose to down sample data. already you have it for down sampling. for example if user select 30% from UI the sample_rate is 0.3

reza1615 · 2023-04-10T19:51:19Z

aschonfeld · 2023-04-29T11:53:07Z

@reza1615 @harshithakolipaka just realeased v2.15.0 with this feature. Let me
Know if you have any issues

reza1615 changed the title ~~stratified random samplin base on a target~~ Charts: stratified random samplin base on a target May 21, 2021

aschonfeld added the enhancement New feature or request label May 21, 2021

aschonfeld added a commit that referenced this issue Apr 27, 2023

#490: stratified random sample in charts

f926633

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

3d5357e

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

3a3e721

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

9449e47

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

c4a237d

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

4b92d93

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

08cf336

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

8faf267

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

a8c7895

aschonfeld added a commit that referenced this issue Apr 28, 2023

#490: stratified random sample in charts

69a1f1c

aschonfeld closed this as completed Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Charts: stratified random samplin base on a target #490

Charts: stratified random samplin base on a target #490

reza1615 commented May 21, 2021

harshithakolipaka commented Mar 5, 2023

aschonfeld commented Mar 6, 2023

aschonfeld commented Apr 10, 2023

reza1615 commented Apr 10, 2023

reza1615 commented Apr 10, 2023

aschonfeld commented Apr 29, 2023

Charts: stratified random samplin base on a target #490

Charts: stratified random samplin base on a target #490

Comments

reza1615 commented May 21, 2021

harshithakolipaka commented Mar 5, 2023

aschonfeld commented Mar 6, 2023

aschonfeld commented Apr 10, 2023

reza1615 commented Apr 10, 2023

reza1615 commented Apr 10, 2023

aschonfeld commented Apr 29, 2023