-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set value of first column from input #12
Comments
Hi @ninamwa, Thanks for your question. Unfortunately, the current version doesn't support this functionality. However, I managed to make some quick and dirty changes that will do the job.
diff --git a/synthpop/synthpop.py b/synthpop/synthpop.py
index 0b36c0e..e72b56a 100644
--- a/synthpop/synthpop.py
+++ b/synthpop/synthpop.py
@@ -81,11 +81,11 @@ class Synthpop:
# save the method
self.saved_methods[col] = col_method
- def generate(self, k=None):
- self.k = k
+ def generate(self, condition_df):
+ self.condition_df = condition_df
# check generate
- self.validator.check_generate()
+ # self.validator.check_generate()
# generate
synth_df = self._generate()
# postprocess
@@ -94,9 +94,12 @@ class Synthpop:
return processed_synth_df
def _generate(self):
- synth_df = pd.DataFrame(data=np.zeros([self.k, len(self.visit_sequence)]), columns=self.visit_sequence.index)
+ synth_df = pd.DataFrame(data=np.zeros([len(self.condition_df), len(self.visit_sequence)]), columns=self.visit_sequence.index)
+ synth_df[list(self.condition_df.columns)] = self.condition_df
for col, visit_step in self.visit_sequence.sort_values().iteritems():
+ if col in list(self.condition_df.columns):
+ continue
print('generate_{}'.format(col))
# reload the method
import pandas as pd
from synthpop import Synthpop
from datasets.adult import df, dtypes
spop = Synthpop()
spop.fit(df, dtypes)
condition_df = pd.DataFrame({"age": [25, 80, 20, 50, 47]})
synth_df = spop.generate(condition_df)
print(synth_df.head()) Hope that helps. |
Thank you very much, this was exactly what i needed :-) |
Hi again. I am guessing it has something to do with the ability to use random.choice for the possibilities in each leaf, instead of getting the same result each time, but i was not sure :) |
That's exactly right. My goal was to replicate the methodology from the R package and this is the authors' approach. |
Hi.
I was wondering if is possible to set the values of the first columns from user input or other data?
Ex: I want to generate 10 new rows of data, but i do not want the age of each person (which is the first column) to be randomly sampled from the observed data, i want it to be set based on input from the user :-)
The text was updated successfully, but these errors were encountered: