# Projects In Advanced Machine Learning

Hello! This is a notebook outlining projects that I worked on throughout Spring 2021 in the course Project in Advanced Machine Learning with Professor Michael Parrot. Below you will see a quick summary of the three main assignments I worked on as well as links to their respective Github repositories. If you are interested in learning more about what I did for this class, please feel free to reach out. 

## 1. Predicting World Happiness 

[Visit the Repository](https://github.com/elliotttrio/advml/tree/master/Homework%201)

For our first class project, we were asked to create predictive models regarding how "happy" a country is using indicators provided by the UN World Happiness Report. Below you can see a preview of the dataset as well as my best model. 

In [3]:
import pandas as pd

happiness_df = pd.read_csv("datasets/worldhappiness2019.csv")
happiness_df['happiness_cat'] = happiness_df['Happiness_level'].astype("category")
happiness_df['happiness_cat'] = happiness_df['happiness_cat'].cat.reorder_categories(['Very Low', 'Low', 'Average', 'High', 'Very High'])
happiness_df['happiness_cat'] = happiness_df['happiness_cat'].cat.codes

In [4]:
happiness_df.head()

Unnamed: 0,Happiness_level,Country or region,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,name,region,sub-region,happiness_cat
0,Very High,Finland,1.34,1.587,0.986,0.596,0.153,0.393,Finland,Europe,Northern Europe,4
1,Very High,Denmark,1.383,1.573,0.996,0.592,0.252,0.41,Denmark,Europe,Northern Europe,4
2,Very High,Norway,1.488,1.582,1.028,0.603,0.271,0.341,Norway,Europe,Northern Europe,4
3,Very High,Iceland,1.38,1.624,1.026,0.591,0.354,0.118,Iceland,Europe,Northern Europe,4
4,Very High,Netherlands,1.396,1.522,0.999,0.557,0.322,0.298,Netherlands,Europe,Western Europe,4


In [5]:
happiness_df.dtypes

Happiness_level                  object
Country or region                object
GDP per capita                  float64
Social support                  float64
Healthy life expectancy         float64
Freedom to make life choices    float64
Generosity                      float64
Perceptions of corruption       float64
name                             object
region                           object
sub-region                       object
happiness_cat                      int8
dtype: object

In [6]:
happiness_df.corr()

Unnamed: 0,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,happiness_cat
GDP per capita,1.0,0.754906,0.835462,0.379079,-0.079662,0.29892,0.789635
Social support,0.754906,1.0,0.719009,0.447333,-0.048126,0.181899,0.738577
Healthy life expectancy,0.835462,0.719009,1.0,0.390395,-0.029511,0.295283,0.786932
Freedom to make life choices,0.379079,0.447333,0.390395,1.0,0.269742,0.438843,0.525097
Generosity,-0.079662,-0.048126,-0.029511,0.269742,1.0,0.326538,0.028123
Perceptions of corruption,0.29892,0.181899,0.295283,0.438843,0.326538,1.0,0.32373
happiness_cat,0.789635,0.738577,0.786932,0.525097,0.028123,0.32373,1.0


### My best model

In [7]:
import aimodelshare as ai

INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


In [44]:
from aimodelshare.aws import set_credentials
un_apiurl = "https://z69mxrxdz5.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=un_apiurl,credential_file="credentials_updated.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [45]:
data=ai.get_leaderboard(un_apiurl, verbose=3)
data = data[data.username == 'eat2153']
ai.leaderboard.stylize_leaderboard(data.head(1))

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,batchnormalization_layers,dense_layers,dropout_layers,relu_act,softmax_act,loss,optimizer,model_config,username,version
47,44.23%,43.69%,48.61%,45.70%,keras,,True,Sequential,4.0,11373.0,,4.0,,3.0,1.0,str,SGD,"{'name': 'sequential_11', 'lay...",eat2153,29


In [46]:
bestmodel = ai.aimsonnx.instantiate_model(un_apiurl, version = 29) 

bestmodel.summary()

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_38 (Dense)             (None, 104)               1248      
_________________________________________________________________
dense_39 (Dense)             (None, 80)                8400      
_________________________________________________________________
dense_40 (Dense)             (None, 20)                1620      
_________________________________________________________________
dense_41 (Dense)             (None, 5)                 105       
Total params: 11,373
Trainable params: 11,373
Non-trainable params: 0
_________________________________________________________________


## 2. Computer Vision with Covid Positive X-Ray Image Data

[Visit the Repository](https://github.com/elliotttrio/advml/tree/master/Homework%202)


For the next project, I examined x-ray images of patients with and without Covid-19. I then developed computer vision models to predict x-rays of patients with the virus and those who do not have it. Below you will see a preview of the images within the dataset along with the best model that I developed.

![](https://raw.githubusercontent.com/elliotttrio/advml/master/Homework%202/xray%20images.png)



### My Best Model

In [35]:
covid_apiurl = "https://sxr89y55o4.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl= covid_apiurl,credential_file="credentials_updated.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [39]:
data=ai.get_leaderboard(covid_apiurl, verbose=3)
data = data[data.username == 'eat2153']
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,activation_layers,add_layers,averagepooling2d_layers,batchnormalization_layers,concatenate_layers,conv2d_layers,dense_layers,dropout_layers,flatten_layers,globalaveragepooling2d_layers,inputlayer_layers,maxpooling2d_layers,zeropadding2d_layers,relu_act,sigmoid_act,softmax_act,loss,optimizer,model_config,username,version
57,nan%,nan%,nan%,nan%,keras,True,True,Sequential,10,64899,,,,,,6.0,1,,1.0,,,2.0,,6,,1.0,str,SGD,"{'name': 'sequential_5', 'laye...",eat2153,61


In [42]:
bestmodel = ai.aimsonnx.instantiate_model(covid_apiurl, version = 61) 

bestmodel.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_38 (Conv2D)           (None, 98, 98, 32)        896       
_________________________________________________________________
conv2d_39 (Conv2D)           (None, 98, 98, 32)        1056      
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 49, 49, 32)        0         
_________________________________________________________________
conv2d_40 (Conv2D)           (None, 47, 47, 32)        9248      
_________________________________________________________________
conv2d_41 (Conv2D)           (None, 47, 47, 32)        1056      
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 23, 23, 32)        0         
_________________________________________________________________
conv2d_42 (Conv2D)           (None, 21, 21, 32)       

## 3. Predicting Covid misinformation on Twitter

[Visit the Repository](https://github.com/elliotttrio/advml/tree/master/Homework%203)


For my final project for the semester, I explored a dataset of Covid-19 tweets thare are labeled as "true" or "false." Using this data, I created predictive models using LSTM and embedding layers. Summary information of the dataset is included below along with my best model. Please vist the project's repo to see my attempt at generating tweet to validate my model.

In [47]:
trainingdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/Constraint_Train.csv", usecols = ['tweet','label'])

In [48]:
trainingdata.head()

Unnamed: 0,tweet,label
0,The CDC currently reports 99031 deaths. In gen...,real
1,States reported 1121 deaths a small rise from ...,real
2,Politically Correct Woman (Almost) Uses Pandem...,fake
3,#IndiaFightsCorona: We have 1524 #COVID testin...,real
4,Populous states can generate large case counts...,real


In [49]:
trainingdata[trainingdata.label == 'real'].sample(n=10)

Unnamed: 0,tweet,label
122,An important part of our work is data collecti...,real
4206,RT @ICMRDELHI: India has crossed the milestone...,real
3005,Schools could be forced to close partially or ...,real
5633,Scaling up testing is great news. But if we ca...,real
5305,The reported death toll was 1726 bringing our ...,real
2971,Did you know September is Sepsis Awareness Mon...,real
1024,Our combined total of confirmed and probable c...,real
6328,This chart looks at per-capita testing rates a...,real
27,Just Appendix B gathering all the state orders...,real
6054,This week CDC received 15 models to forecast p...,real


In [50]:
trainingdata[trainingdata.label == 'fake'].sample(n=10)


Unnamed: 0,tweet,label
4257,The most up to date data in Victoria for 2020 ...,fake
6235,Jennifer Lopez Reveals That Since Sheltering-i...,fake
3183,Says a pandemic occurs exactly every 100 years.,fake
4223,Tamil Nadu Govt ordered to re-open TASMAC bars.,fake
4713,A photo claiming that SARS-CoV-2 is an old vir...,fake
108,This is an image of a suspected coronavirus va...,fake
4051,@IvankaTrump Chinese Virologist Dr. #LiMengYan...,fake
851,Trump said at his press briefing that anyone w...,fake
2986,"1. 2Brilliant business model\nMore than masks,...",fake
2087,"""It's NOT a SECOND WAVE of COVID-19 coming soo...",fake


### My Best Model

In [51]:
tweets_apiurl = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl= tweets_apiurl,credential_file="credentials_updated.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [54]:
data=ai.get_leaderboard(tweets_apiurl, verbose=3)
data = data[data.username == 'eat2153']
ai.leaderboard.stylize_leaderboard(data.head(1))

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
39,93.36%,93.36%,93.34%,93.43%,keras,False,True,Sequential,6,361442,,1.0,1,1,,1.0,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential', 'layers...",eat2153,87


In [56]:
bestmodel = ai.aimsonnx.instantiate_model(tweets_apiurl, version = 87) 

bestmodel.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 140, 32)           320000    
_________________________________________________________________
lstm (LSTM)                  (None, 140, 64)           24832     
_________________________________________________________________
lstm_1 (LSTM)                (None, 140, 32)           12416     
_________________________________________________________________
conv1d (Conv1D)              (None, 137, 32)           4128      
_________________________________________________________________
global_max_pooling1d (Global (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 2)                 66        
Total params: 361,442
Trainable params: 361,442
Non-trainable params: 0
__________________________________________________

### **Works Cited**

Shahi, Gautam Kishore, Anne Dirkson, and Tim A. Majchrzak. "An exploratory study of covid-19 misinformation on twitter." Online Social Networks and Media 22 (2021): 100104.

M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, Z.B. Mahbub, K.R. Islam, M.S. Khan, A. Iqbal, N. Al-Emadi, M.B.I. Reaz, “Can AI help in screening Viral and COVID-19 pneumonia?” arXiv preprint, 29 March 2020, https://arxiv.org/abs/2003.13145.