# Machine Learning - Predicting Treatment Abandonment with scikit learn
By **Daniel Palacio** (github.com/palaciodaniel) - 2020

## STEP ONE - Building the DataFrame

In order to build the table, we will need to have the Faker package installed. For obvious privacy reasons, the use of real patient data has been ruled out, so we will build a fictional DataFrame from scratch. In case you do not have the Faker package installed, please remove the "#" sign on the following cell and Run it.

**DO NOT RUN THE REST OF THE CELLS, OTHERWISE THEIR OUTPUT WILL BE LOST.**

In [1]:
# pip install Faker

1.  We import Faker and "random" libraries. We will use Faker to create two columns: "Name" and "Gender" ([M]ale and [F]emale). As for "random", we will use it to generate a column with ages from 18 to 85.

    All the information will be stored in a list of lists (with every nested list representing a single observation). We will generate 100 observations, so the list should have that length.

    Notice that for both Faker and "random" we will be using seeds, so that we obtain the same results every time.

In [2]:
from faker import Faker
import random

fake = Faker()
Faker.seed(24)

names_list = []

for i in range(100):
    random.seed(i)
    profile = fake.simple_profile()
    names_list.extend([[profile["name"], profile["sex"], random.randint(18, 85)]])

print("List length:", len(names_list), "\n")
print(names_list)

List length: 100 

[['Clayton Murillo', 'M', 67], ['Mary Johnson', 'F', 35], ['Angela Taylor', 'F', 25], ['Sarah Massey', 'F', 48], ['Joseph Lam', 'M', 48], ['Corey Hernandez', 'M', 50], ['Colton Bryan', 'M', 28], ['Kristy Young', 'F', 59], ['Carla Bradford', 'F', 47], ['Jane Brown', 'F', 77], ['Traci Alvarez', 'F', 22], ['Jennifer Sexton', 'F', 75], ['Miss Jenna Beasley', 'F', 78], ['Matthew Francis', 'M', 51], ['Krystal Robinson', 'F', 31], ['Christopher Campbell', 'M', 44], ['Cassandra Becker', 'F', 64], ['Christopher Harrison', 'M', 84], ['Alex Morales', 'M', 41], ['Jerry Rodgers', 'M', 23], ['Matthew Davis', 'M', 37], ['Ralph Stewart', 'M', 39], ['Beth Rodriguez', 'F', 35], ['Eduardo Cordova', 'M', 55], ['Preston Long', 'M', 67], ['Alyssa Stein', 'F', 66], ['Terri Hernandez', 'F', 43], ['Peggy Bowen', 'F', 79], ['Jose Leblanc', 'M', 32], ['Wanda Wilson', 'F', 27], ['Natasha Holt', 'F', 55], ['Dr. Mary Neal', 'F', 19], ['Eddie Buckley', 'M', 27], ['Jose Ward', 'M', 39], ['Justin Le

2.  We will now generate columns for different fictional features. The use of a real test was dismissed to avoid potential copyright infringements, but still, this whole demonstration shows how it is perfectly feasible to implement a Machine Learning approach on a situation like this.

    Obviously, every observation feature will be filled at "random", from a list of three possible choices: "Low", "Medium" and "High".
    
    A brief description of some of the non-obvious (and obviously fictional) features:
    
    * **Neuroticism:** Indicates how much the adaptation to everyday's life demands was compromised.
    * **Resourcefulness:** This one not only includes intelligence, but also any other resource that can be used to fulfill the treatment's objectives (i.e.: salary, social skills, etc.).
    * **Social Expectation:** Measures the perceived level of "pressure" the patient has from the most immediate social circles to overcome the problem/mental disorder.
    * **Introspection:** The ability to understand and assess all the changes the treatment produces on the patient's well-being and general quality of life.
    * **Victimhood:** Unlike the previous features, this one is either "True" or "False". If "True", the patient tends to believe life will always be inherently "harsh", while people on the opposite side tend to be more "combative" and understand that change is possible.

In [3]:
neuroticism = []
motivation = []
resourcefulness = []
social_expectation = []
introspection = []
discipline = []
victimhood = []

for i in range(100):
    random.seed(i)
    neuroticism.append(random.choice(["Low", "Medium", "High"]))
    motivation.append(random.choice(["Low", "Medium", "High"]))
    resourcefulness.append(random.choice(["Low", "Medium", "High"]))
    social_expectation.append(random.choice(["Low", "Medium", "High"]))
    introspection.append(random.choice(["Low", "Medium", "High"]))
    discipline.append(random.choice(["Low", "Medium", "High"]))
    victimhood.append(random.choice([True, False]))

3. Now we import the pandas module and transform the nested lists from Step 1 into a proper DataFrame. 

##### NOTE: The column 'Genre' should be 'Gender' or 'Sex'. This is a typing mistake. However, it was left like that to avoid having to re-execute Step 7, but it will be fixed for the second phase.

In [4]:
import pandas as pd

patient_df = pd.DataFrame(names_list, columns=["Name", "Genre", "Age"])
print(patient_df.head())

              Name Genre  Age
0  Clayton Murillo     M   67
1     Mary Johnson     F   35
2    Angela Taylor     F   25
3     Sarah Massey     F   48
4       Joseph Lam     M   48


4. We also integrate the features into a DataFrame as well.

In [5]:
patient_df2 = pd.DataFrame(list(zip(neuroticism, motivation, resourcefulness,
                                    social_expectation, introspection, discipline, victimhood)),
                          columns = ["Neuroticism", "Motivation", "Resourcefulness", "Social Expectation",
                                     "Introspection", "Discipline", "Victimhood"])

print("DataFrame's shape:", patient_df2.shape, "\n")
print(patient_df2.head())

DataFrame's shape: (100, 7) 

  Neuroticism Motivation Resourcefulness Social Expectation Introspection  \
0      Medium     Medium             Low             Medium          High   
1         Low       High             Low             Medium           Low   
2         Low        Low             Low             Medium           Low   
3         Low       High            High                Low        Medium   
4         Low     Medium             Low               High        Medium   

  Discipline  Victimhood  
0     Medium       False  
1     Medium       False  
2       High       False  
3       High       False  
4     Medium        True  


5. We combine the DataFrames from both previous steps into a single one.

In [6]:
patient_final_df = pd.concat([patient_df, patient_df2], axis = 1)

print("Final DataFrame's shape:", patient_final_df.shape, "\n")
print(patient_final_df.head())

Final DataFrame's shape: (100, 10) 

              Name Genre  Age Neuroticism Motivation Resourcefulness  \
0  Clayton Murillo     M   67      Medium     Medium             Low   
1     Mary Johnson     F   35         Low       High             Low   
2    Angela Taylor     F   25         Low        Low             Low   
3     Sarah Massey     F   48         Low       High            High   
4       Joseph Lam     M   48         Low     Medium             Low   

  Social Expectation Introspection Discipline  Victimhood  
0             Medium          High     Medium       False  
1             Medium           Low     Medium       False  
2             Medium           Low       High       False  
3                Low        Medium       High       False  
4               High        Medium     Medium        True  


6.  We are ready to create the target column now. The following code prints every single observation and allows to input a "realistic" treatment's outcome based on all features. Did the patient finish the treatment?

    After examining all the 100 observations, the selected outcomes will be appended to a list.

In [7]:
finished_treatment = []

for i in range(len(patient_final_df)):
    print(patient_final_df.loc[i, :])
    your_input = input("Finished treatment? (TRUE/FALSE): ")
    finished_treatment.append(your_input)

print(finished_treatment)   

Name                  Clayton Murillo
Genre                               M
Age                                67
Neuroticism                    Medium
Motivation                     Medium
Resourcefulness                   Low
Social Expectation             Medium
Introspection                    High
Discipline                     Medium
Victimhood                      False
Name: 0, dtype: object
Finished treatment? (TRUE/FALSE): False
Name                  Mary Johnson
Genre                            F
Age                             35
Neuroticism                    Low
Motivation                    High
Resourcefulness                Low
Social Expectation          Medium
Introspection                  Low
Discipline                  Medium
Victimhood                   False
Name: 1, dtype: object
Finished treatment? (TRUE/FALSE): True
Name                  Angela Taylor
Genre                             F
Age                              25
Neuroticism                     Low
M

Finished treatment? (TRUE/FALSE): False
Name                  Jerry Rodgers
Genre                             M
Age                              23
Neuroticism                    High
Motivation                      Low
Resourcefulness                High
Social Expectation              Low
Introspection                  High
Discipline                      Low
Victimhood                    False
Name: 19, dtype: object
Finished treatment? (TRUE/FALSE): True
Name                  Matthew Davis
Genre                             M
Age                              37
Neuroticism                    High
Motivation                     High
Resourcefulness                 Low
Social Expectation           Medium
Introspection                  High
Discipline                     High
Victimhood                     True
Name: 20, dtype: object
Finished treatment? (TRUE/FALSE): True
Name                  Ralph Stewart
Genre                             M
Age                              39
Neurot

Finished treatment? (TRUE/FALSE): False
Name                  Anthony Beck
Genre                            M
Age                             44
Neuroticism                    Low
Motivation                  Medium
Resourcefulness             Medium
Social Expectation             Low
Introspection                  Low
Discipline                     Low
Victimhood                   False
Name: 39, dtype: object
Finished treatment? (TRUE/FALSE): False
Name                  Christopher Becker
Genre                                  M
Age                                   76
Neuroticism                       Medium
Motivation                          High
Resourcefulness                     High
Social Expectation                   Low
Introspection                        Low
Discipline                        Medium
Victimhood                          True
Name: 40, dtype: object
Finished treatment? (TRUE/FALSE): True
Name                  Lisa Hendricks
Genre                              F

Finished treatment? (TRUE/FALSE): False
Name                  Nicole Russell
Genre                              F
Age                               46
Neuroticism                      Low
Motivation                       Low
Resourcefulness                 High
Social Expectation            Medium
Introspection                    Low
Discipline                       Low
Victimhood                     False
Name: 59, dtype: object
Finished treatment? (TRUE/FALSE): False
Name                  Mary Sanchez
Genre                            F
Age                             57
Neuroticism                 Medium
Motivation                  Medium
Resourcefulness               High
Social Expectation             Low
Introspection               Medium
Discipline                     Low
Victimhood                   False
Name: 60, dtype: object
Finished treatment? (TRUE/FALSE): False
Name                  James Spence
Genre                            M
Age                             81
Neuroti

Finished treatment? (TRUE/FALSE): True
Name                  Christy Shepard
Genre                               F
Age                                36
Neuroticism                       Low
Motivation                     Medium
Resourcefulness                Medium
Social Expectation               High
Introspection                     Low
Discipline                        Low
Victimhood                       True
Name: 79, dtype: object
Finished treatment? (TRUE/FALSE): True
Name                  Margaret Greer
Genre                              F
Age                               52
Neuroticism                   Medium
Motivation                    Medium
Resourcefulness                 High
Social Expectation              High
Introspection                 Medium
Discipline                    Medium
Victimhood                     False
Name: 80, dtype: object
Finished treatment? (TRUE/FALSE): False
Name                  Ryan Spencer
Genre                            M
Age           

Finished treatment? (TRUE/FALSE): True
Name                  Allen Smith
Genre                           M
Age                            69
Neuroticism                Medium
Motivation                 Medium
Resourcefulness               Low
Social Expectation           High
Introspection                 Low
Discipline                    Low
Victimhood                   True
Name: 99, dtype: object
Finished treatment? (TRUE/FALSE): False
['False', 'True', 'False', 'True', 'False', 'True', 'False', 'False', 'True', 'False', 'False', 'True', 'True', 'True', 'True', 'False', 'False', 'False', 'False', 'True', 'True', 'True', 'False', 'False', 'False', 'True', 'True', 'True', 'True', 'True', 'True', 'True', 'False', 'True', 'False', 'False', 'False', 'True', 'False', 'False', 'True', 'True', 'False', 'False', 'True', 'False', 'False', 'True', 'True', 'True', 'False', 'False', 'True', 'True', 'True', 'False', 'False', 'True', 'False', 'False', 'False', 'False', 'False', 'False', 'True', 'T

7. We append the target column to the previous combined DataFrame. It is officially finished now.

In [8]:
patient_final_df["Finished"] = finished_treatment
print(patient_final_df.head())

              Name Genre  Age Neuroticism Motivation Resourcefulness  \
0  Clayton Murillo     M   67      Medium     Medium             Low   
1     Mary Johnson     F   35         Low       High             Low   
2    Angela Taylor     F   25         Low        Low             Low   
3     Sarah Massey     F   48         Low       High            High   
4       Joseph Lam     M   48         Low     Medium             Low   

  Social Expectation Introspection Discipline  Victimhood Finished  
0             Medium          High     Medium       False    False  
1             Medium           Low     Medium       False     True  
2             Medium           Low       High       False    False  
3                Low        Medium       High       False     True  
4               High        Medium     Medium        True    False  


8. We save the DataFrame to a CSV file, ready to be cleaned and prepared on Step 2...

In [9]:
patient_final_df.to_csv("df_patients.csv", index = True)