Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set value of first column from input #12

Closed
ninamwa opened this issue Dec 8, 2020 · 4 comments
Closed

Set value of first column from input #12

ninamwa opened this issue Dec 8, 2020 · 4 comments

Comments

@ninamwa
Copy link

ninamwa commented Dec 8, 2020

Hi.
I was wondering if is possible to set the values of the first columns from user input or other data?

Ex: I want to generate 10 new rows of data, but i do not want the age of each person (which is the first column) to be randomly sampled from the observed data, i want it to be set based on input from the user :-)

@ganevgv
Copy link
Collaborator

ganevgv commented Dec 18, 2020

Hi @ninamwa,

Thanks for your question. Unfortunately, the current version doesn't support this functionality. However, I managed to make some quick and dirty changes that will do the job.

  1. When installing the package from source - please make sure that you install with develop. I.e. run python setup.py develop instead of python setup.py install

  2. Make the following changes to the synthpop/synthpop.py file. I.e. delete all the lines in red and add all the lines in green.

diff --git a/synthpop/synthpop.py b/synthpop/synthpop.py
index 0b36c0e..e72b56a 100644
--- a/synthpop/synthpop.py
+++ b/synthpop/synthpop.py
@@ -81,11 +81,11 @@ class Synthpop:
             # save the method
             self.saved_methods[col] = col_method

-    def generate(self, k=None):
-        self.k = k
+    def generate(self, condition_df):
+        self.condition_df = condition_df

         # check generate
-        self.validator.check_generate()
+        # self.validator.check_generate()
         # generate
         synth_df = self._generate()
         # postprocess
@@ -94,9 +94,12 @@ class Synthpop:
         return processed_synth_df

     def _generate(self):
-        synth_df = pd.DataFrame(data=np.zeros([self.k, len(self.visit_sequence)]), columns=self.visit_sequence.index)
+        synth_df = pd.DataFrame(data=np.zeros([len(self.condition_df), len(self.visit_sequence)]), columns=self.visit_sequence.index)
+        synth_df[list(self.condition_df.columns)] = self.condition_df

         for col, visit_step in self.visit_sequence.sort_values().iteritems():
+            if col in list(self.condition_df.columns):
+                continue
             print('generate_{}'.format(col))

             # reload the method
  1. Here's an example if you want to generate synth data with first column containing [25, 80, 20, 50, 47] ages.
import pandas as pd

from synthpop import Synthpop
from datasets.adult import df, dtypes


spop = Synthpop()
spop.fit(df, dtypes)

condition_df = pd.DataFrame({"age": [25, 80, 20, 50, 47]})
synth_df = spop.generate(condition_df)
print(synth_df.head())

Hope that helps.

@ninamwa
Copy link
Author

ninamwa commented Dec 28, 2020

Thank you very much, this was exactly what i needed :-)

@ninamwa ninamwa closed this as completed Dec 28, 2020
@ninamwa
Copy link
Author

ninamwa commented Dec 28, 2020

Hi again.
I also have a question about the implementation.
Why have you chosen to use the apply method for the decision trees when predicting new data, instead of simply using the built-in predict method?

I am guessing it has something to do with the ability to use random.choice for the possibilities in each leaf, instead of getting the same result each time, but i was not sure :)

@ganevgv
Copy link
Collaborator

ganevgv commented Jan 6, 2021

That's exactly right. My goal was to replicate the methodology from the R package and this is the authors' approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants