Lambda School Data Science

*Unit 2, Sprint 3, Module 1*

---


# Define ML problems

You will use your portfolio project dataset for all assignments this sprint.

## Assignment

Complete these tasks for your project, and document your decisions.

- [ ] Choose your target. Which column in your tabular dataset will you predict?
- [ ] Is your problem regression or classification?
- [ ] How is your target distributed?
    - Classification: How many classes? Are the classes imbalanced?
    - Regression: Is the target right-skewed? If so, you may want to log transform the target.
- [ ] Choose your evaluation metric(s).
    - Classification: Is your majority class frequency >= 50% and < 70% ? If so, you can just use accuracy if you want. Outside that range, accuracy could be misleading. What evaluation metric will you choose, in addition to or instead of accuracy?
    - Regression: Will you use mean absolute error, root mean squared error, R^2, or other regression metrics?
- [ ] Choose which observations you will use to train, validate, and test your model.
    - Are some observations outliers? Will you exclude them?
    - Will you do a random split or a time-based split?
- [ ] Begin to clean and explore your data.
- [ ] Begin to choose which features, if any, to exclude. Would some features "leak" future information?

If you haven't found a dataset yet, do that today. [Review requirements for your portfolio project](https://lambdaschool.github.io/ds/unit2) and choose your dataset.

Some students worry, ***what if my model isn't “good”?*** Then, [produce a detailed tribute to your wrongness. That is science!](https://twitter.com/nathanwpyle/status/1176860147223867393)

In [36]:
import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

In [37]:
df = pd.read_csv('../food_coded.csv')

In [38]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125 entries, 0 to 124
Data columns (total 61 columns):
GPA                             123 non-null object
Gender                          125 non-null int64
breakfast                       125 non-null int64
calories_chicken                125 non-null int64
calories_day                    106 non-null float64
calories_scone                  124 non-null float64
coffee                          125 non-null int64
comfort_food                    124 non-null object
comfort_food_reasons            124 non-null object
comfort_food_reasons_coded      106 non-null float64
cook                            122 non-null float64
comfort_food_reasons_coded.1    125 non-null int64
cuisine                         108 non-null float64
diet_current                    124 non-null object
diet_current_coded              125 non-null int64
drink                           123 non-null float64
eating_changes                  122 non-null object
eating_chan

In [39]:
df.dtypes

GPA                              object
Gender                            int64
breakfast                         int64
calories_chicken                  int64
calories_day                    float64
calories_scone                  float64
coffee                            int64
comfort_food                     object
comfort_food_reasons             object
comfort_food_reasons_coded      float64
cook                            float64
comfort_food_reasons_coded.1      int64
cuisine                         float64
diet_current                     object
diet_current_coded                int64
drink                           float64
eating_changes                   object
eating_changes_coded              int64
eating_changes_coded1             int64
eating_out                        int64
employment                      float64
ethnic_food                       int64
exercise                        float64
father_education                float64
father_profession                object


In [40]:
df.isnull().sum()

GPA                              2
Gender                           0
breakfast                        0
calories_chicken                 0
calories_day                    19
calories_scone                   1
coffee                           0
comfort_food                     1
comfort_food_reasons             1
comfort_food_reasons_coded      19
cook                             3
comfort_food_reasons_coded.1     0
cuisine                         17
diet_current                     1
diet_current_coded               0
drink                            2
eating_changes                   3
eating_changes_coded             0
eating_changes_coded1            0
eating_out                       0
employment                       9
ethnic_food                      0
exercise                        13
father_education                 1
father_profession                3
fav_cuisine                      2
fav_cuisine_coded                0
fav_food                         2
food_childhood      

In [41]:
df['weight'].isnull().sum()

2

In [42]:
df = df.dropna(subset=['weight'])
df['weight'].isnull().sum()

0

In [43]:
df = df.fillna('Missing')
df.isnull().sum()

GPA                             0
Gender                          0
breakfast                       0
calories_chicken                0
calories_day                    0
calories_scone                  0
coffee                          0
comfort_food                    0
comfort_food_reasons            0
comfort_food_reasons_coded      0
cook                            0
comfort_food_reasons_coded.1    0
cuisine                         0
diet_current                    0
diet_current_coded              0
drink                           0
eating_changes                  0
eating_changes_coded            0
eating_changes_coded1           0
eating_out                      0
employment                      0
ethnic_food                     0
exercise                        0
father_education                0
father_profession               0
fav_cuisine                     0
fav_cuisine_coded               0
fav_food                        0
food_childhood                  0
fries         

In [44]:
df.head()

Unnamed: 0,GPA,Gender,breakfast,calories_chicken,calories_day,calories_scone,coffee,comfort_food,comfort_food_reasons,comfort_food_reasons_coded,cook,comfort_food_reasons_coded.1,cuisine,diet_current,diet_current_coded,drink,eating_changes,eating_changes_coded,eating_changes_coded1,eating_out,employment,ethnic_food,exercise,father_education,father_profession,fav_cuisine,fav_cuisine_coded,fav_food,food_childhood,fries,fruit_day,grade_level,greek_food,healthy_feeling,healthy_meal,ideal_diet,ideal_diet_coded,income,indian_food,italian_food,life_rewarding,marital_status,meals_dinner_friend,mother_education,mother_profession,nutritional_check,on_off_campus,parents_cook,pay_meal_out,persian_food,self_perception_weight,soup,sports,thai_food,tortilla_calories,turkey_calories,type_sports,veggies_day,vitamins,waffle_calories,weight
0,2.4,2,1,430,Missing,315,1,none,we dont have comfort,9,2,9,Missing,eat good and exercise,1,1,eat faster,1,1,3,3,1,1,5,profesor,Arabic cuisine,3,1,rice and chicken,2,5,2,5,2,looks not oily,being healthy,8,5,5,5,1,1,"rice, chicken, soup",1,unemployed,5,1,1,2,5,3,1,1,1,1165,345,car racing,5,1,1315,187
1,3.654,1,1,610,3,420,2,"chocolate, chips, ice cream","Stress, bored, anger",1,3,1,1,I eat about three times a day with some snacks...,2,2,I eat out more than usual.,1,2,2,2,4,1,2,Self employed,Italian,1,1,"chicken and biscuits, beef soup, baked beans",1,4,4,4,5,"Grains, Veggies, (more of grains and veggies),...",Try to eat 5-6 small meals a day. While trying...,3,4,4,4,1,2,"Pasta, steak, chicken",4,Nurse RN,4,1,1,4,4,3,1,1,2,725,690,Basketball,4,2,900,155
2,3.3,1,1,720,4,420,2,"frozen yogurt, pizza, fast food","stress, sadness",1,1,1,3,"toast and fruit for breakfast, salad for lunch...",3,1,sometimes choosing to eat fast food instead of...,1,3,2,3,5,2,2,owns business,italian,1,3,"mac and cheese, pizza, tacos",1,5,3,5,6,usually includes natural ingredients; nonproce...,i would say my ideal diet is my current diet,6,6,5,5,7,2,"chicken and rice with veggies, pasta, some kin...",2,owns business,4,2,1,3,5,6,1,2,5,1165,500,none,5,1,900,I'm not answering this.
3,3.2,1,1,430,3,420,2,"Pizza, Mac and cheese, ice cream",Boredom,2,2,2,2,"College diet, cheap and easy foods most nights...",2,2,Accepting cheap and premade/store bought foods,1,3,2,3,5,3,2,Mechanic,Turkish,3,1,"Beef stroganoff, tacos, pizza",2,4,4,5,7,"Fresh fruits& vegetables, organic meats","Healthy, fresh veggies/fruits & organic foods",2,6,5,5,2,2,Grilled chicken \rStuffed Shells\rHomemade Chili,4,Special Education Teacher,2,1,1,2,5,5,1,2,5,725,690,Missing,3,1,1315,"Not sure, 240"
4,3.5,1,1,720,2,420,2,"Ice cream, chocolate, chips","Stress, boredom, cravings",1,1,1,2,I try to eat healthy but often struggle becaus...,2,2,I have eaten generally the same foods but I do...,3,4,2,2,4,1,4,IT,Italian,1,3,"Pasta, chicken tender, pizza",1,4,4,4,6,"A lean protein such as grilled chicken, green ...",Ideally I would like to be able to eat healthi...,2,6,2,5,1,1,"Chicken Parmesan, Pulled Pork, Spaghetti and m...",5,Substance Abuse Conselor,3,1,1,4,2,4,1,1,4,940,500,Softball,4,2,760,190
