# 1: Overview

In the previous mission, we learned how to explore a pandas DataFrame. In this mission, we'll explore how to manipulate a DataFrame and make transformations to it. We'll continue to work with the same data set from the USDA on nutritional information. We'll build a basic nutritional index for people who want to eat high-protein, low-fat foods. The "Lipid_Tot_(g)" column contains each food's total fat content, and the "Protein_(g)" (in grams) contains each food's total protein content (in grams). Let's use the following formula to score each food in our data set:

While this formula is by no means scientific, it will act as a guide as we explore pandas further.


## Instructions

To practice what we learned in the previous mission:

1. Import the pandas library.
2. Read food_info.csv into a DataFrame object named food_info.
3. Use the DataFrame.columns attribute, followed by the Index.tolist() method, to return a list containing only the column names.
3. Assign the resulting list to col_names, and use the print() function to display the value.
4. Display the first three rows of food_info.


In [5]:
import pandas as pd
food_info = pd.read_csv( "../data/food_info.csv")
food_info.head( 3 )

Unnamed: 0,NDB_No,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),...,Vit_A_IU,Vit_A_RAE,Vit_E_(mg),Vit_D_mcg,Vit_D_IU,Vit_K_(mcg),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg)
0,1001,BUTTER WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,51.368,21.021,3.043,215.0
1,1002,BUTTER WHIPPED WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,50.489,23.426,3.012,219.0
2,1003,BUTTER OIL ANHYDROUS,0.24,876,0.28,99.48,0.0,0.0,0.0,0.0,...,3069.0,840.0,2.8,1.8,73.0,8.6,61.924,28.732,3.694,256.0
3,1004,CHEESE BLUE,42.41,353,21.4,28.74,5.11,2.34,0.0,0.5,...,721.0,198.0,0.25,0.5,21.0,2.4,18.669,7.778,0.8,75.0
4,1005,CHEESE BRICK,41.11,371,23.24,29.68,3.18,2.79,0.0,0.51,...,1080.0,292.0,0.26,0.5,22.0,2.5,18.764,8.598,0.784,94.0


In [6]:
col_names = food_info.columns.tolist()
print( col_names )

['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)']


# 2. Transforming a Column

We can use the arithmetic operators to transform a numerical column. The values in the "Iron_(mg)" column, for example, are currently in milligrams. We can divide each value by 1000 to convert the values to grams. The following code will divide each value in the "Iron_(mg)" column by 1000, and return a new Series object with those values:

```
div_1000 = food_info["Iron_(mg)"] / 1000
```

pandas allows us to use any of the arithmetic operators to scale the values in a numerical column:

Adds 100 to each value in the column and returns a Series object.

```
add_100 = food_info["Iron_(mg)"] + 100
```


Subtracts 100 from each value in the column and returns a Series object.

```
sub_100 = food_info["Iron_(mg)"] - 100
```

Multiplies each value in the column by 2 and returns a Series object.

```
mult_2 = food_info["Iron_(mg)"]*2
```

## Instructions

Divide the "Sodium_(mg)" column by 1000 to convert the values to grams, and assign the result to sodium_grams.
Multiply the "Sugar_Tot_(g)" column by 1000 to convert to milligrams, and assign the result to sugar_milligrams.


In [7]:
sodium_grams = food_info["Sodium_(mg)"] / 1000
sugar_milligrams = food_info["Sugar_Tot_(g)"] * 1000

# 3: Performing Math with Multiple Columns

In addition to transforming columns by numerical values, we can transform columns by other columns. When we use an arithmetic operator between two columns (Series objects), pandas will perform that computation in a pair-wise fashion, and return a new Series object. It applies the arithmetic operator to the first value in both columns, the second value in both columns, and so on.

In the following code, we multiply the "Water_(g)" column by the "Energ_Kcal" column, and assign the resulting Series to water_energy:

```
water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]
```

## Instructions

Assign the number of grams of protein per gram of water ("Protein_(g)" column divided by "Water_(g)" column) to grams_of_protein_per_gram_of_water.

Assign the total amount of calcium and iron ("Calcium_(mg)" column plus "Iron_(mg)" column) to milligrams_of_calcium_and_iron.


In [9]:
grams_of_protein_per_gram_of_water =food_info["Protein_(g)"]/food_info["Water_(g)"] 
milligrams_of_calcium_and_iron = food_info["Calcium_(mg)"]+food_info["Iron_(mg)"]

# 4. Create a Nutritional Index

Now that we've learned how to transform columns with a numerical value and how to combine columns, we can use the following formula to create a nutritional index:

## Instructions

Multiply the "Protein_(g)" column by two, and assign the resulting Series to weighted_protein.
Multiply the "Lipid_Tot_(g)" column by -0.75, and assign the resulting Series to weighted_fat.
Add both Series objects together and assign the result to initial_rating.


In [11]:
weighted_protein = food_info["Protein_(g)"]*2
weighted_fat = food_info["Lipid_Tot_(g)"]*0.75
initial_rating = weighted_protein - weighted_fat
print ( initial_rating[0:5] )

0   -59.1325
1   -59.1325
2   -74.0500
3    21.2450
4    24.2200
dtype: float64


# 5: Normalizing Columns in a Data Set

The columns in the data set use different units (kilo-calories, milligrams, etc.). As a result, the range of values varies greatly between columns. For example, the "Vit_A_IU" column ranges from 0 to 100000, while the "Fiber_TD_(g)" column ranges from 0 to 79. For certain calculations, columns like "Vit_A_IU" can have a greater effect on the result, due to the scale of the values.

While there are many ways to normalize data, one of the simplest ways is to divide all of the values in a column by that column's maximum value. This way, all of the columns will range from 0 to 1. To calculate the maximum value of a column, we use the Series.max() method. In the following code, we use the Series.max() method to calculate the largest value in the "Energ_Kcal" column, and assign it to max_calories:

The largest value in the "Energ_Kcal" column.

```
max_calories = food_info["Energ_Kcal"].max()
```

We can then use the division operator (/) to divide the values in the "Energ_Kcal" column by the maximum value, max_calories:

Divide the values in "Energ_Kcal" by the largest value.

```
normalized_calories = food_info["Energ_Kcal"] / max_calories
```

Instructions

Normalize the values in the "Protein_(g)" column, and assign the result to normalized_protein.

Normalize the values in the "Lipid_Tot_(g)" column, and assign the result to normalized_fat.

In [14]:
max_protein = food_info["Protein_(g)"].max()
max_fat = food_info["Lipid_Tot_(g)"].max()

normalized_protein = food_info["Protein_(g)"] / max_protein
normalized_fat = food_info["Lipid_Tot_(g)"] / max_fat

# 6: Creating a New Column

So far, we've assigned the Series object that results from a column transform to a variable. However, we can add it to the DataFrame as a new column instead.

We add bracket notation to specify the name we want for that column, then use the assignment operator (=) to specify the Series object containing the values we want to assign to that column:

```
iron_grams = food_info["Iron_(mg)"] / 1000  

food_info["Iron_(g)"] = iron_grams
```

The DataFrame food_info now includes the "Iron_(g)" column, which contains the values from iron_grams.
Instructions

Assign the normalized "Protein_(g)" column to a new column named "Normalized_Protein" in food_info.
Assign the normalized "Lipid_Tot_(g)" column to a new column named "Normalized_Fat" in food_info.


In [15]:
food_info["Normalized_Protein"] = normalized_protein
food_info["Normalized_Fat"] = normalized_fat

# 7: Create a Normalized Nutritional Index

Combining what you've learned so far, you can now create a nutritional index that uses the normalized fat and protein values, instead of the original values.

Here's the formula for reference:

Instructions

Use the Normalized_Protein and Normalized_Fat columns with the formula above to create the Norm_Nutr_Index column.


In [16]:
food_info["Normalized_Protein"] = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()
food_info["Normalized_Fat"] = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()

nor_protein = food_info["Normalized_Protein"]*2
nor_fat = food_info["Normalized_Fat"]*0.75
nor_rating = nor_protein - nor_fat 

food_info["Norm_Nutr_Index"] = nor_rating

# 8: Sorting a DataFrame by a Column

The DataFrame currently appears in numerical order according to the NDB_No column. NDB_No is a unique USDA identifier that isn't really useful for our needs. To explore which foods rank the highest in the Norm_Nutr_Index column, we need to sort the DataFrame by that column. DataFrame objects have a sort_values() method that we can use to sort the entire DataFrame.

To sort the DataFrame on the Sodium_(mg) column, pass in the column name to the DataFrame.sort_values() method, and assign the resulting DataFrame to a new variable:

```
food_info.sort_values("Sodium_(mg)")
```

By default, pandas will sort the data by the column we specify in ascending order and return a new DataFrame, rather than modifying food_info itself. To customize the method's behavior, use the parameters listed in the documentation:

Sorts the DataFrame in-place, rather than returning a new DataFrame.

```
food_info.sort_values("Sodium_(mg)", inplace=True)
```

 Sorts by descending order, rather than ascending.

```
food_info.sort_values("Sodium_(mg)", inplace=True, ascending=False)
```

## Instructions

Sort the food_info DataFrame in-place on the Norm_Nutr_Index column in descending order.


In [17]:
food_info.sort_values("Norm_Nutr_Index", inplace=True, ascending=False)

In [19]:
food_info.head( 2 )

Unnamed: 0,NDB_No,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),...,Vit_D_mcg,Vit_D_IU,Vit_K_(mcg),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg),Normalized_Protein,Normalized_Fat,Norm_Nutr_Index
4991,16423,SOY PROT ISOLATE K TYPE CRUDE PROT BASIS,4.98,321,88.32,0.53,3.58,2.59,2.0,0.0,...,0.0,0.0,0.0,0.066,0.101,0.258,0.0,1.0,0.0053,1.996025
6155,19177,GELATINS DRY PDR UNSWTND,13.0,335,85.6,0.1,1.3,0.0,0.0,0.0,...,0.0,0.0,0.0,0.07,0.06,0.01,0.0,0.969203,0.001,1.937656
