### Question
If the victim of human trafficking is being forced to perform labor, can we predict the type of labor given the inputs available from our dataset?

#### Model
Given an input set, we are trying to predict an outcome. In our base set, we have a known outcome (the type of labor forced upon the victim (represented by the 'Labor_Type' feature)). Can we use an unsupervised machine learning model or a neural-network learning model to predict the type of exploit?

#### Output
We need the model to predict one of 12 different forced labor categories derived from our data and ETL process.

### Table: Forced Labor Types Categorized
| Original Data | Data Code |
| ----- | ----- |
| typeOfLabourAgriculture | 1 |
| typeOfLabourAquafarming | 2 |
| typeOfLabourBegging | 3 |
| typeOfLabourConstruction | 4 |
| typeOfLabourDomesticWork | 5 |
| typeOfLabourHospitality | 6 |
| typeOfLabourIllicitActivities | 7 |
| typeOfLabourManufacturing | 8 |
| typeOfLabourMiningOrDrilling | 9 |
| typeOfLabourPeddling | 10 |
| typeOfLabourTransportation | 11 |
| typeOfLabourOther | 12 |
| typeOfLabourNotSpecified | 12 |

In [18]:
import os
import pandas as pd
import numpy as np

from sklearn.cluster import KMeans

In [19]:
model_df = pd.read_csv(os.path.join("../../Exports", "eda5.csv"))
model_df.head()

Unnamed: 0,yearOfRegistration,Datasource,gender,citizenship,isForcedLabour,ControlCategory,RecruiterCategory,ExploitType,Labor_Type,ageCategories
0,2012,1,1,27,1,2,2,2,5,7
1,2012,1,1,27,1,1,2,2,5,7
2,2012,1,1,27,1,2,2,2,5,7
3,2012,1,1,27,1,1,2,2,5,7
4,2012,1,1,27,1,1,2,2,5,7


In [20]:
forced_labor_df = model_df[model_df['isForcedLabour'] == 1]
forced_labor_df.head()

Unnamed: 0,yearOfRegistration,Datasource,gender,citizenship,isForcedLabour,ControlCategory,RecruiterCategory,ExploitType,Labor_Type,ageCategories
0,2012,1,1,27,1,2,2,2,5,7
1,2012,1,1,27,1,1,2,2,5,7
2,2012,1,1,27,1,2,2,2,5,7
3,2012,1,1,27,1,1,2,2,5,7
4,2012,1,1,27,1,1,2,2,5,7


In [21]:
forced_labor_df['isForcedLabour'].value_counts()

1    2137
Name: isForcedLabour, dtype: int64

In [22]:
forced_labor_df['Labor_Type'].value_counts()

4     810
12    568
8     304
5     229
2      91
1      69
3      66
Name: Labor_Type, dtype: int64

In [23]:
# Separate our data from our outcomes
outcomes = forced_labor_df['Labor_Type']
data_df = forced_labor_df.drop(['isForcedLabour', 'yearOfRegistration', 'ExploitType', 'Datasource'], axis=1)
data_df.head()

Unnamed: 0,gender,citizenship,ControlCategory,RecruiterCategory,Labor_Type,ageCategories
0,1,27,2,2,5,7
1,1,27,1,2,5,7
2,1,27,2,2,5,7
3,1,27,1,2,5,7
4,1,27,1,2,5,7


In [24]:
# Initialize a model with K = 12
labor_model = KMeans(n_clusters = 12, random_state = 52)
labor_model.fit(data_df)

# Get our model's predictions
labor_predictions = labor_model.predict(data_df)
print(labor_predictions)

[3 3 3 ... 2 2 2]


In [25]:
# Compare our model-predicted outcomes to known outcomes
accuracy_df = pd.DataFrame({"Actual Outcomes": outcomes, "Predicted Outcomes": labor_model.labels_})

accuracy_df

Unnamed: 0,Actual Outcomes,Predicted Outcomes
0,5,3
1,5,3
2,5,3
3,5,3
4,5,3
...,...,...
5261,12,2
5262,12,2
5263,12,2
5264,12,2


In [26]:
# Check how many of our predictions match the actual category
correct_count = 0
for row in accuracy_df.iterrows():
    if row[1]["Actual Outcomes"] + 1 == row[1]["Predicted Outcomes"]:
        correct_count += 1
    else:
        correct_count = correct_count

# Show how many we got correct
correct_count

140