# Predicing Year of Marriage 

## End to End Machine Learning Deployment with Flask-AWS

PART - 1 : Model Building and  hosting local API
1. Data Preparation
2. Machine Learning Modelling
3. Model Evaluation
4. Export Trained Model
5. LOCAL REST API with Flask web-server 
6. Create a website for predicing marriage age calling REST API

PART - 2 : Deploying Public API to AWS EC2 server and launch website service 
1. Spin up an EC2 server
2. Configure EC2 with security group and private key
3. Install libraries and dependencies on the EC2 server
4. Move trained model and app.py flask files to EC2 (winscp)
5. Configure flaskapp.wsgi file and Apache vhost file
6. Restart apache webserver and Check API status
7. Launch a website with domain name and host webpage.



In [1]:
import pandas as pd
data = pd.read_csv('age_of_marriage_data.csv')
print(data.shape)
data.head()

(2567, 10)


Unnamed: 0,id,gender,height,religion,caste,mother_tongue,profession,location,country,age_of_marriage
0,1,female,"5'4""",,others,Telugu,,London,United Kingdom,21.0
1,2,male,"5'7""",Jain,Shwetamber,Gujarati,Doctor / Healthcare Professional,Fairfax- VA,USA,32.0
2,3,male,"5'7""",Hindu,Brahmin,Hindi,Entrepreneurs / Business,Begusarai,India,32.0
3,4,female,"5'0""",Hindu,Thakur,Hindi,Architect,Mumbai,India,30.0
4,5,male,"5'5""",Christian,Born Again,Malayalam,Sales Professional / Marketing,Sulthan Bathery,India,30.0


In [76]:
data.isnull().sum()

id                   0
gender              29
height             118
religion           635
caste              142
mother_tongue      164
profession         330
location           155
country             16
age_of_marriage     19
dtype: int64

In [2]:
(data.shape[0] - data.dropna().shape[0])/data.shape[0]

0.24737047136735488

In [3]:
data.dropna(inplace=True)

In [4]:
data.shape

(1932, 10)

In [5]:
data.head(2)

Unnamed: 0,id,gender,height,religion,caste,mother_tongue,profession,location,country,age_of_marriage
1,2,male,"5'7""",Jain,Shwetamber,Gujarati,Doctor / Healthcare Professional,Fairfax- VA,USA,32.0
2,3,male,"5'7""",Hindu,Brahmin,Hindi,Entrepreneurs / Business,Begusarai,India,32.0


In [6]:
data.profession.unique()

array(['Doctor / Healthcare Professional', 'Entrepreneurs / Business ',
       'Architect', 'Sales Professional / Marketing', 'Sportsman',
       'Banking Professional', 'Software Professional', 'HR Professional',
       'Finance Professional', 'Not Specified', 'Not working',
       'Chartered Accountant', 'Logistics and Travel Professional',
       'Defense Services', 'Team Member / Staff',
       'Managers and Senior Executives', 'Admin Professional',
       'Accounting Professional (Others)', 'Investment Professional',
       'Civil Engineer', 'Consultant / Supervisor / Team Leads',
       'Public Relations Professional', 'Training Professional (Others)',
       'Hotel & Hospitality Professional (Others)',
       'Software Professional (Others)', 'Nurse', 'Artist (Others)',
       'Non IT Engineer (Others)', 'Event Manager',
       'Marketing Professional', 'Science Professional (Others)',
       'Mechanical / Production Engineer', 'Research Assistant',
       'Electronics / Telecom

In [7]:
X = data.loc[:,['gender','height','religion','caste','mother_tongue','country']]
y = data.age_of_marriage

In [8]:
X.head()

Unnamed: 0,gender,height,religion,caste,mother_tongue,country
1,male,"5'7""",Jain,Shwetamber,Gujarati,USA
2,male,"5'7""",Hindu,Brahmin,Hindi,India
3,female,"5'0""",Hindu,Thakur,Hindi,India
4,male,"5'5""",Christian,Born Again,Malayalam,India
5,male,"5'5""",Hindu,Valmiki,Hindi,India


In [9]:
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
X.loc[:,['gender','religion','caste','mother_tongue','country']]= \
X.loc[:,['gender','religion','caste','mother_tongue','country']].apply(enc.fit_transform)

In [10]:
X.head()

Unnamed: 0,gender,height,religion,caste,mother_tongue,country
1,1,"5'7""",2,34,6,19
2,1,"5'7""",1,14,8,5
3,0,"5'0""",1,36,8,5
4,1,"5'5""",0,13,13,5
5,1,"5'5""",1,38,8,5


In [11]:
int(X.loc[1,'height'].split('\'')[0])*30.48

152.4

In [12]:
int(X.loc[1,'height'].split('\'')[1].replace('"',''))*2.54

17.78

In [13]:
def h_cms(h):
    return int(h.split('\'')[0])*30.48+\
    int(h.split('\'')[1].replace('"',''))*2.54

In [14]:
X['height_cms'] = X.height.apply(h_cms)

In [15]:
X.head()

Unnamed: 0,gender,height,religion,caste,mother_tongue,country,height_cms
1,1,"5'7""",2,34,6,19,170.18
2,1,"5'7""",1,14,8,5,170.18
3,0,"5'0""",1,36,8,5,152.4
4,1,"5'5""",0,13,13,5,165.1
5,1,"5'5""",1,38,8,5,165.1


In [16]:
X.drop('height',inplace=True,axis=1)

In [17]:
X.head()

Unnamed: 0,gender,religion,caste,mother_tongue,country,height_cms
1,1,2,34,6,19,170.18
2,1,1,14,8,5,170.18
3,0,1,36,8,5,152.4
4,1,0,13,13,5,165.1
5,1,1,38,8,5,165.1


In [18]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)

In [19]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=80,max_depth=11)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)

# Evaluation

In [20]:
from sklearn.metrics import mean_absolute_error, r2_score
print("MAE : ", mean_absolute_error(y_test,y_predict))
r2_score(y_test,y_predict)

MAE :  1.0264384988300685


0.703335412791039

# Export model

In [22]:
from sklearn.externals import joblib
joblib.dump(model,'marriage_age_predict_model.ml')

['marriage_age_predict_model.ml']