#### This notebook collects user input, validates it to ensure compatibility with model's features, and then makes <i>salary</i> predictions based on the input provided.

In [None]:
import pandas as pd
clean_data = "../data/Clean_Salary_Data.csv"
model_data = pd.read_csv(clean_data)
job_titles = model_data["Job Title"].unique().tolist()
job_titles

### Load Random Forests Model

In [2]:
import joblib
rf_model = joblib.load("../models/random_forests_model.pkl")

### Input Guidelines
#### Age: (21 - 62)
#### Gender: (Male/ Female/ Other)
#### Eduaction Level: (High School/ Bachelor's Degree/ Master's Degree/ PhD)
#### Job Title : input job title must be in job_titles list above
#### Years of Experience: (0 - 34)

### Input Prompt and Validation

In [3]:
genders= ["Male", "Female", "Other"]
education_level_options = ["High School", "Bachelor's Degree", "Master's Degree", "PhD"]

def input_features():
    age = 0
    while True:
        try:
            age = float(input("Enter Age (21 - 62): "))
        except ValueError:
            print("Invalid input.")
        if age < 21 or age > 62:
            print("Age must be a number between 21 and 62.")
        else:
            break


    gender = None
    while True:
        gender = str(input("Enter Gender (Male/ Female/ Other): ")).capitalize()
        if gender not in genders:
            print("Invalid input, gender must be Male, Female, or Other")
        else:
            break

    education_lvl = None
    while True:
        education_lvl = str(input("Enter Education Level (High School/ Bachelor's Degree/ Master's Degree/ PhD): "))
        if education_lvl not in education_level_options:
            print("Invalid input. Education Level must be one of the specified options.")
        else:
            break

    job_title = None
    while True:
        job_title = str(input("Enter Job Title (Must be a valid title): "))
        if job_title not in job_titles:
            print("Invalid input. Job Title must be one of the specified options.")
        else:
            break
    
    years = 0
    while True:
        try:
            years = float(input("Enter Years of Experience: "))
        except ValueError:
            print("Invalid input.")
            continue
        if years < 0 or years > 34:
            print("Years of Experience must be between 0 and 34 years.")
        else:
            break
    
    return age, gender, education_lvl, job_title, years

age, gender, education_lvl, job_title, years = input_features()

Invalid input. Job Title must be one of the specified options.
Years of Experience must be between 0 and 34 years.


### Store input in a dataframe

In [4]:
df = model_data.drop(["Salary", "Unnamed: 0", "Age Group"], axis = 1)

input_dict = {
    "Age": [age],
    "Gender": [gender],
    "Education Level": [education_lvl],
    "Job Title": [job_title],
    "Years of Experience": [years]
}

input_df = pd.DataFrame(input_dict)

In [5]:
# preview input data
input_df

Unnamed: 0,Age,Gender,Education Level,Job Title,Years of Experience
0,54.0,Other,High School,Software Developer,17.0


In [6]:
sample = input_df.to_json(orient='records')
print(sample)

[{"Age":54.0,"Gender":"Other","Education Level":"High School","Job Title":"Software Developer","Years of Experience":17.0}]


#### Best Solution I could think of at the time
##### Appending User Input: Collected user input for Age, Gender, Education Level, Job Title, and Years of Experience.

##### Appending to Main DataFrame: Appended the user input as a new row to the main DataFrame used for training, creating a larger combined DataFrame for both training data and user input.

##### Preprocessing: Performed preprocessing, including label encoding and one-hot encoding, on the combined DataFrame, which includes both training data and user input.

##### Prediction: Made predictions using the trained model on the last row of the combined DataFrame, which represents the user input.

In [7]:
# append input data to training data
input_row = input_df.iloc[0]
df = df.append(input_row, ignore_index= True)

  df = df.append(input_row, ignore_index= True)


In [8]:
input_row.to_frame().T

Unnamed: 0,Age,Gender,Education Level,Job Title,Years of Experience
0,54.0,Other,High School,Software Developer,17.0


In [9]:
df.tail(1)

Unnamed: 0,Age,Gender,Education Level,Job Title,Years of Experience
1787,54.0,Other,High School,Software Developer,17.0


### Preprocessing

In [10]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_encoder = LabelEncoder()
df["Education Level"] = label_encoder.fit_transform(df["Education Level"])

onehot_encoder = OneHotEncoder(drop="first", sparse_output=False)
onehot_encoded = onehot_encoder.fit_transform(df[["Gender", "Job Title"]])
# Create a DataFrame from the one-hot encoded array
onehot_df = pd.DataFrame(onehot_encoded, columns=onehot_encoder.get_feature_names_out(["Gender", "Job Title"]))
# Concatenate the one-hot encoded DataFrame with the rest of the features
df = pd.concat([df, onehot_df], axis=1)
# Drop the original "Gender" and "Job Title" columns
df = df.drop(["Gender", "Job Title"], axis=1)

### Predicting Salary

In [11]:
salary_pred = rf_model.predict(df.tail(1))
print(f"Age: {int(age)}\nGender: {gender}\nEducation Level: {education_lvl}\nJob Title: {job_title}\nYears of Experience: {int(years)} years\n\nPredicted salary: {int(salary_pred)}")

Age: 54
Gender: Other
Education Level: High School
Job Title: Software Developer
Years of Experience: 17 years

Predicted salary: 156647


#### The model successfully predicted the salary from the user input.
<!-- ####<img src="input_image.png"/> -->