#Autism Diagnostics: Is there a Gender Bias?

####Based on my previous project where I studied the answers and resulting diagnostics of men and women who took a screening on a phone app, my goal is to: 

*   Predict a Diagnosis Based on Answers to Screening
*   Use the results to create a visualization showing how accuracy scores using different verification models lined up with the actual diagnosis, as well as gender, in order to better visualize how predictable men and women were. My goal is to verify that a model aiming for accuracy would have a similar accuracy for both men and women.
*    Share my findings and my next steps in analyzing the data


In [0]:
#installs

!pip install plotly --upgrade

Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/63/2b/4ca10995bfbdefd65c4238f9a2d3fde33705d18dd50914dd13302ec1daf1/plotly-4.1.0-py2.py3-none-any.whl (7.1MB)
[K     |████████████████████████████████| 7.1MB 5.0MB/s 
Installing collected packages: plotly
  Found existing installation: plotly 3.6.1
    Uninstalling plotly-3.6.1:
      Successfully uninstalled plotly-3.6.1
Successfully installed plotly-4.1.0


In [0]:
#imports 

import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
import plotly.express as px
from scipy import stats

In [0]:
#data


asd_data = 'https://raw.githubusercontent.com/shmilyface/DS-Unit-1-Sprint-5-Data-Storytelling-Blog-Post/master/csv_result-Autism-Adult-Data.csv'
columns = ['patient', 'sounds', 'big_picture', 'multitask', 'interrupt', 
           'social_comp', 'bored', 'reading_int', 'hyperfocus', 'read_faces',
           'intentions', 'age', 'gender', 'ethnicity', 'jaundice', 
           'family_autism', 'country', 'used_app', 'result', 'age_desc', 
           'relation', 'class/asd'
          ]

#Cleaning Data

> The dataset was developed utilizing input received on a research application developed by [Name]. For this analysis, I'm only interested in the questions from the survey and the age/gender of the individual. My next goal is to analyze all parts of the data to better understand the relationships between different features, in a similar fashion, or with improved methods as I learn more about the world of Data Science and Machine Learning.  



In [0]:

dropcols = ['jaundice', 'used_app', 'age_desc', 'relation', 'ethnicity', 'result']

df = pd.read_csv(asd_data, header=0, names=columns)
df.drop(dropcols, axis=1, inplace=True)
df = df[df.age != '?'] # 2
df.age = df.age.astype(int)
df['gender'] = df['gender'].map({ 'm':0, 'f':1 })
df['class/asd'] = df['class/asd'].map({ 'NO':0, 'YES':1 })

From the code snippet above: 

*   Dropped Columns
*   `df = df[df.age != '?']`
*  Changed datatype for age to int
*  Changed `gender` and `class/asd` to binary. 



In [0]:
#post-cleaning snapshot
df.head()

Unnamed: 0,patient,sounds,big_picture,multitask,interrupt,social_comp,bored,reading_int,hyperfocus,read_faces,intentions,age,gender,family_autism,country,class/asd
0,1,1,1,1,1,0,0,1,1,0,0,26,1,no,United States,0
1,2,1,1,0,1,0,0,0,1,0,1,24,0,yes,Brazil,0
2,3,1,1,0,1,1,0,1,1,1,1,27,0,yes,Spain,1
3,4,1,1,0,1,0,0,1,1,0,1,35,1,yes,United States,0
4,5,1,0,0,0,0,0,0,1,0,0,40,1,no,Egypt,0


In [0]:
df.corr()

# two highest correlated non-target features (47.88%):
# 'bored'
# 'read_faces'

Unnamed: 0,patient,sounds,big_picture,multitask,interrupt,social_comp,bored,reading_int,hyperfocus,read_faces,intentions,age,gender,class/asd
patient,1.0,0.060465,0.048101,0.029516,0.008942,0.037542,-0.062995,0.043661,-0.040806,0.010181,0.017084,-0.045238,-0.018814,0.03752
sounds,0.060465,1.0,0.012033,0.070229,0.123898,0.170253,0.107769,0.219444,0.142301,0.142904,0.118341,0.023059,0.075594,0.296099
big_picture,0.048101,0.012033,1.0,0.224762,0.159718,0.151401,0.186408,-0.044838,0.035919,0.206045,0.066231,0.020824,0.044654,0.312159
multitask,0.029516,0.070229,0.224762,1.0,0.411198,0.265631,0.267671,0.078866,0.014268,0.313894,0.168516,0.029504,-0.000685,0.440248
interrupt,0.008942,0.123898,0.159718,0.411198,1.0,0.307682,0.293951,0.15215,0.004794,0.326397,0.211155,0.032539,0.056789,0.469136
social_comp,0.037542,0.170253,0.151401,0.265631,0.307682,1.0,0.39314,0.236398,0.102513,0.397423,0.265461,-0.025095,0.036949,0.538055
bored,-0.062995,0.107769,0.186408,0.267671,0.293951,0.39314,1.0,0.176153,0.097996,0.478777,0.294771,0.034705,0.083858,0.591647
reading_int,0.043661,0.219444,-0.044838,0.078866,0.15215,0.236398,0.176153,1.0,0.086408,0.190224,0.250011,-0.026533,-0.064994,0.35243
hyperfocus,-0.040806,0.142301,0.035919,0.014268,0.004794,0.102513,0.097996,0.086408,1.0,0.099381,0.100618,-0.080438,-0.064223,0.235557
read_faces,0.010181,0.142904,0.206045,0.313894,0.326397,0.397423,0.478777,0.190224,0.099381,1.0,0.28366,0.054004,-0.00687,0.635147


The correlation feature allows me to verify whether or not one column has a high similarity with another column. It's essentially a matrix that gives a 0 to 1 float that allows me to discern how important the relationship is. 

Ultimately: 


`[bored]` showed a correlation score of 0.478777 with `[read_faces]`, which is significant enough to warrant additional exploration. 

Using that methodology, I also went with the following features: 

`[read_faces]`, `[social_comp]`, `[multitask]`, `[interrupt]`, `[gender]`, `[age]`

For full disclosure, my goal for age was to be able to cluster or otherwise show more diversity in the graph for plotting, to allow me more options for sharing the data. Ultimately, I chose to stay with a tabular display, due to the information I wanted to share. As I learn more about showcasing relationships in different ways, I'll add visualizations and help clarify my analysis.

In [0]:
#split data

X = df[['bored', 'read_faces', 'social_comp', 'multitask', 'interrupt', 'gender', 'age']]
y = df['class/asd']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
#model creation
model = Pipeline([('xgb', xgb.XGBClassifier(random_state=42))])
#model fit
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)
# 85.8% Baseline

0.9090909090909091

###Model Results Discussion/Next Steps

> The initial accuracy score was based on bored, and read faces, and was 85.8%. After adding features one at a time, I was able to tune the model to get an accuracy score of 

###0.90909090909090901

> Meaning the model could predict the diagnosis of a patient 90.909% of the time. *Not. Too. Shabby.* 

In [0]:
#k means accuracy for comparison

kfold = KFold(n_splits=5, random_state=42)
results = cross_val_score(model, X, y, cv=kfold)
results.mean()

0.9059878419452888



> Utilizing an additional categorical verification method, I chose iteration to see if multiple models could find something better. 

###0.9059878419452888

> The K Means Accuracy score was pretty dang close to the Accuracy Score. 

*My future goal is to create additional features based on the grading methodology for the assessment, as well as the previous diagnostic methods and how the same patients would have fared given the changes to the diagnostic process over time.*



#First Visualization / Interactive

In [0]:
# GRAPH DATA

questions = ['sounds', 'big_picture', 'multitask', 'interrupt', 
           'social_comp', 'bored', 'reading_int', 'hyperfocus', 'read_faces',
           'intentions']
df2 = pd.melt(df[questions], value_vars=questions)
df2 = df2[df2.value == 1]
fig = px.histogram(df2, x='variable', color='variable', title='% of Patients that said Yes to Questions')
fig.show()

Just for clarification before we start diving into the higher view of accuracy scores, I've created a visual/interactive graph to showcase the number of Yes's to the questions. Whats interesting is that `[bored]` had one of the highest correlations amongst the traits studied, while having the smallest percentage of Yes's. 

*I also want to just make the note here that every Yes is a person saying "This is hurting my quality of life" and that matters. While I'm currently focused on autism, the weight of what each observation stands for stays with me. It's important for us to keep digging into the data, looking for answers, and finding ways to reveal them, so we can connect them with people who can help them*


---


> My goal at this point is to create a visual that best showcases the differences in accuracy score based on the actual class_asd/accuracy_score/k_means accuracy, as well as how patients responded to corr columns `[bored]` and `[read_faces]`. 

>My thought process at first was to create a plot showcasing the information, but no visual seemed to do it justice. Binary features are hard to showcase in comparison when predicted, I've decided.
Additionally, I feel a tabular showcase will best represent the results in a way that would keep the audience more engaged. 

#Second Visualization

In [0]:
col_names=['Actual Diagnosis', 'Gender']
comparison = pd.DataFrame([ y_test, X_test['gender']]).T
comparison.columns=col_names
#Class/ASD
comparison['baseline'] = 0.8579545454545454
#Predicted by XGBoost Model
comparison['Prediction'] = y_pred
#Accuracy Score
comparison['accuracy'] = 0.9034090909090909
#KMeans Accuracy Score
comparison['kmeans'] = 0.908855116514691

comparison= comparison[['Actual Diagnosis', 'Prediction', 'Gender', 'baseline', 'accuracy', 'kmeans']]
#comparison['pred_result'] = 


comparison.head()

Unnamed: 0,Actual Diagnosis,Prediction,Gender,baseline,accuracy,kmeans
495,0,0,1,0.857955,0.903409,0.908855
166,0,0,1,0.857955,0.903409,0.908855
54,1,1,0,0.857955,0.903409,0.908855
643,0,0,1,0.857955,0.903409,0.908855
609,0,0,0,0.857955,0.903409,0.908855


This is closer to what I want, but I'd like to make sure it's visually understandable without a guide. I'll make the following goals and then build the code to make it happen:

* Change `[Actual Diagnosis]`, `[Prediction]`, and `[Gender]` back from binary to descriptors
* Create an Accurate column to clarify Actual/Prediction while still showing actual date. 
* Change `[index]` to ascending order to better track if you wanted to compare with original dataframe.



###Final Table

In [0]:
comparison['Accurate'] = comparison['Actual Diagnosis'] == comparison['Prediction']
comparison['Accurate'] = comparison['Accurate'].map({ False:0, True:1 })

male_preds = comparison[comparison['Gender'] == 0]
male_accurates = male_preds[male_preds['Actual Diagnosis'] == male_preds['Prediction']]
male_pct = male_accurates.shape[0] / male_preds.shape[0]
female_preds = comparison[comparison['Gender'] == 1]
female_accurates = female_preds[female_preds['Actual Diagnosis'] == female_preds['Prediction']]
female_pct = female_accurates.shape[0] / female_preds.shape[0]

In [0]:
comparison['Actual Diagnosis'] = comparison['Actual Diagnosis'].map({ 0:'No', 1:'Yes' })
comparison['Gender'] = comparison['Gender'].map({ 0:'Male', 1:'Female' })
comparison['Prediction'] = comparison['Prediction'].map({ 0:'No', 1:'Yes' })
comparison['Accurate'] = comparison['Accurate'].map({ 0:'No', 1:'Yes' })
comparison= comparison[['Actual Diagnosis', 'Prediction', 'Accurate', 'Gender', 'baseline', 'accuracy', 'kmeans']]

In [0]:
final_table=comparison.sort_index(ascending=True)

In [0]:
final_table.head(25) #change the number to see more of the df

Unnamed: 0,Actual Diagnosis,Prediction,Accurate,Gender,baseline,accuracy,kmeans
2,Yes,Yes,Yes,Male,0.857955,0.903409,0.908855
6,No,No,Yes,Female,0.857955,0.903409,0.908855
10,Yes,Yes,Yes,Male,0.857955,0.903409,0.908855
18,No,No,Yes,Female,0.857955,0.903409,0.908855
24,No,No,Yes,Male,0.857955,0.903409,0.908855
29,No,No,Yes,Male,0.857955,0.903409,0.908855
30,No,No,Yes,Male,0.857955,0.903409,0.908855
31,Yes,Yes,Yes,Female,0.857955,0.903409,0.908855
39,Yes,Yes,Yes,Female,0.857955,0.903409,0.908855
41,No,No,Yes,Female,0.857955,0.903409,0.908855


As soon as I finished building this dataframe, my immediate thought was 
*I wonder if theres a difference in the accuracy of the prediction for Male vs. Female!* My hope is the layout of the table will also allow the reader to arrive to the same question themselves.

In [0]:
stats.ttest_ind(male_preds['Accurate'], female_preds['Accurate'])

Ttest_indResult(statistic=0.4774177331275378, pvalue=0.6336638285542556)

Based on our findings in our previous [studies](https://medium.com/@shmilyface/comparison-of-symptoms-of-autism-9d965bfefc3f), the data showcased a much lower average of diagnosis in women versus men, despite having similar ratios of screening results. 

Our theory is that there could be a difference in the accuracy of the model based on gender. 

The results: 

####T Statistic 0.47741 
####P Value 0.63363

These numbers indicate that the results of the accuracy scores of the model in terms of gender, are unbiased. Or, more simply put, even if women are being under-diagnosed for autism, the model will still accurately predict their diagnosis. This means my model is also predicting bias.

*My future goal will be removing result totals and creating a model that can predict the gender of the subject, based on the answers to the questions.* 



---




#Conclusion

The most interesting discovery for me in this was that my model maintained bias due to the method of prediction. In order to discern the discrepancy between the diagnosis of women with autism and their symptoms, I will be creating a dataset, utilizing several resources. The intention is to reach out to the autism community, and have them answer these questions specifically, as well as an additional set of questions that may help me better pinpoint confounding variables (outside influences I haven't factored in) and adjust to ensure accuracy is fully discernable. 

I'm also growing much more comfortable with trusting my data. While it would be wild to have an explosive analysis and discovery that showcased clear bias and its impact on female health, I know that the answers I'm finding actually continue to help me better advocate for the autism community. Knowing the common traits that were used to define an individuals diagnosis can help me understand more about what we can find patterns in, and what patterns we need to learn to identify, in order to continue towards a community of support, understanding, and inclusion. 

I'm excited to continue this as a side project, and while continue to chronicle my journey with this data. 