# Section 4: Working with DataFrames
Dataframes are the blockbuster data struction in Pandas. Many of the methods and techniques we learned for Series are applicable to Dataframes, but they become even more powerful!
* We'll learn how to clean data and prepare it for analysis
* We'll also learn several Dataframe manipulation techniques

In [1]:
import pandas as pd
import numpy as np

## What is a DataFrame?
A **Dataframe** is quite simply a table of data that contains a collection of rows and columns. They are generally two-dimensional (as opposed to Series which are one-dimensional), with labeled indices and columns. However, by using multiple indices, they are also able to accommodate multi-dimensional data. More on that later.

What this practically means is that we need to specify more than one piece of information in order to identify a specific datapoint in the Dataframe. This contrasts with Series, in which only one piece of information is needed (usually the index label or index position).

The `.ndim` attribute gives us the number of dimensions for a Series or Dataframe. Series have 1 dimension and Dataframes have 2 dimensions
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ndim.html
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ndim.html

Recall the `.shape` attribute from Dataframe, which returns a tuple telling us the dimensionality of the dataframe, that is, the number of rows and columns.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html

Both Series and Dataframes are a collection of values with associated labels. But whereas Series have labeled indices only, Dataframes have labeled indices and columns

Each column in a Dataframe is actually Series. That is, the Series object is the data structure that comprises the column of a Dataframe.

Unlike Series, Dataframes can be *heterogenous*. That is, each column of a DataFrame can be a completely different data type. 
* DataFrames themselves DO NOT have a datatype, since they are a collection of different Series. Thus, the `dtype` attribute on a Dataframe will return an attribute error.
* Instead, the `dtypes` attribute may be used on a Dataframe, which will return a Series with the names of the columns as the index labels and the data type as the values.
 * https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html

## Creating a Dataframe

Just like Series, Dataframes can be created manually. The typical way to do this is to construct a list of each of your columns, and then combine them into a Dataframe.

In [2]:
names = ['Olga','Andrew','Brian','Telulah','Nicole','Tilda']
age = [29, 21, 45, 23, 39, 46]
married = [False, True, True, True, False, True]

After creating the lists, we can build a Dataframe using the `pd.DataFrame()` function. We pass in a dictionary whose keys are the names of the columns, and the values are the names of the lists we created.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

In [3]:
pd.DataFrame({'name': names, 'age': age, 'married': married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Keep in mind that the lists that you construct your dataframe out of MUST be of equal length. If they are not, you will encounter a `ValueError`


In [4]:
# names2 = ['Olga','Andrew','Brian','Telulah','Nicole','Tilda', 'Ryan']
# pd.DataFrame({'name': names2, 'age': age, 'married': married})

## BONUS - Four More Ways to Build DataFrames

There are other ways for creating DataFrames as well. The first is by building from a **dictionary of tuples** (instead of a dictionary of lists)

In [5]:
tuple_names = tuple(names)
tuple_ages = tuple(age)
tuple_married = tuple(married)

In [6]:
pd.DataFrame({'name': tuple_names, 'age': tuple_ages, 'married': tuple_married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


The second of the alternative methods is to pass in a **dictionary of Pandas Series**. All we really need to do is build a set of Series, and combine them using the `pd.DataFrame` method

In [7]:
series_names = pd.Series(data = names)
series_ages = pd.Series(data = age)
series_married = pd.Series(data = married)

In [8]:
pd.DataFrame({'name': series_names, 'age': series_ages, 'married': series_married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Slightly more challenging is to use a **dictionary of dictionaries**. The strategy:
1. Replicate the upper left section of the dataframe in order to understand what shape of data you are targeting
2. Then, extend that same structure to the rest of the data programmatically so that we don't have to type it all out

In [9]:
pd.DataFrame({'names' : {0: 'Olga', 1: 'Andrew'}})

Unnamed: 0,names
0,Olga
1,Andrew


From this starter, we use the built-in Python function `enumerate()` to traverse the list and access each item in turn.

In [10]:
list(enumerate(names))

[(0, 'Olga'),
 (1, 'Andrew'),
 (2, 'Brian'),
 (3, 'Telulah'),
 (4, 'Nicole'),
 (5, 'Tilda')]

We'll use dictionary comprehension to create key:value pairs for each of these enumerated names.

In [11]:
dict_names = {key:value for key, value in enumerate(names)}
dict_names

{0: 'Olga', 1: 'Andrew', 2: 'Brian', 3: 'Telulah', 4: 'Nicole', 5: 'Tilda'}

Let's take care of the rest of the columns, converting them all to dictionaries of keys and values

In [12]:
dict_ages = {key:value for key, value in enumerate(age)}
print(dict_ages)
dict_married = {key:value for key, value in enumerate(married)}
print(dict_married)

{0: 29, 1: 21, 2: 45, 3: 23, 4: 39, 5: 46}
{0: False, 1: True, 2: True, 3: True, 4: False, 5: True}


Note that we could have done this using a function, which helps streamline things a big. If you needed to change how the enumeration is carried out, you need only modify your one function instead of modifying each conversion individually:

In [13]:
def convert_list_to_dict(l):
  return {key:value for key, value in enumerate(l)}

Now all we have to do is construct the DataFrame!

In [14]:
pd.DataFrame({'name': dict_names,
              'ages': dict_ages,
              'married': dict_married})

Unnamed: 0,name,ages,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Finally, we can take a row-centric approach that builds a DataFrame row by row (as opposed to the column-centric approaches above that build DataFrames column-by-column) by using a **list of dictionaries**

Let's start by replicating a single row:

In [15]:
pd.DataFrame([{'name':'Olga', 'age':29, 'married': False}])

Unnamed: 0,name,age,married
0,Olga,29,False


Next we use the `zip()` method to combine multiple iterables (in this case `names`, `age`, and `married`, into a list of tuples

In [16]:
list(zip(names, age, married))

[('Olga', 29, False),
 ('Andrew', 21, True),
 ('Brian', 45, True),
 ('Telulah', 23, True),
 ('Nicole', 39, False),
 ('Tilda', 46, True)]

Now we have to build a list of dictionaries. Each row will be its own dictionary, and we can do this using `zip()` and list comprehension

In [17]:
rowwise = [{'name': name, 'age': age, 'married':married} for name, age, married in zip(names, age, married)]
rowwise

[{'age': 29, 'married': False, 'name': 'Olga'},
 {'age': 21, 'married': True, 'name': 'Andrew'},
 {'age': 45, 'married': True, 'name': 'Brian'},
 {'age': 23, 'married': True, 'name': 'Telulah'},
 {'age': 39, 'married': False, 'name': 'Nicole'},
 {'age': 46, 'married': True, 'name': 'Tilda'}]

Finally, we combine them into a DataFrame!

In [18]:
pd.DataFrame(rowwise)

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


There are additional methods for building DataFrames, but we won't be covering them here. Again, in practice you will primarily be reading data into Pandas instead of building it from within.

## The `info()` method


The `info()` method gives you a brief summary of what a DataFrame contains. it is one of the first things you should do when getting ready to work with a new DataFrame.

`info()` prints information including the index dtype, the column names counts of non-null values, and memory usage.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

In [19]:
df = pd.DataFrame({'name': names, 'age': age, 'married': married})
df

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes


The above example was a small DataFrame, but for large DataFrames even the `info()` summary may get very long and overwhelming. We can set the `verbose` parameter to *False* and  the individual column names will not be printed. Additionally, the different dtypes will be summarized instead of listed for each column.

In [21]:
df.info(verbose = False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Columns: 3 entries, name to married
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes


We can also set the `maxcols` parameter to limit the number of columns that will be described by the verbose setting. If the number of columns in the DataFrame exceeds this parameter, it will convert to a non-verbose summary.

In [22]:
print(df.info(max_cols=4))
print('')
print(df.info(max_cols=2))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes
None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Columns: 3 entries, name to married
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes
None


Perhaps the most useful parameter from `info()` is `memory_usage`, where it will estimate the DataFrame's size in memory. If we set `memory_usage` to *deep*, the method will in calculate real memory usage with depp introspection. Otherwise, teh estimation will be made based on column dtype and number of rows, asssuming values consume the same memory amount for corresponding dtypes.

Using *deep* `memory_usage` is slower and requires more computation.

In [23]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 557.0 bytes


## Reading in Nutritional Data

The remainder of this section will rely on a sizeable dataset for roughyl 9000 food items and their nutritional information.

dataurl = https://andybek.com/pandas-nutrition

Import the data

In [24]:
dataurl = 'https://andybek.com/pandas-nutrition'
nutrition = pd.read_csv(dataurl)
nutrition.head(10)

Unnamed: 0.1,Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
5,5,"Cauliflower, raw",100 g,25,0.3g,0.1g,0,30.00 mg,44.3 mg,57.00 mcg,0.00 mcg,0.507 mg,0.667 mg,0.060 mg,0.050 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,1.00 mcg,0,0.00 mcg,0.184 mg,48.2 mg,0.00 IU,0.08 mg,0.08 mg,15.5 mcg,22.00 mg,0.039 mg,0.42 mg,15.00 mg,0.155 mg,44.00 mg,299.00 mg,0.6 mcg,0.27 mg,1.92 g,0.116 g,0.086 g,0.177 g,0.020 g,0.257 g,0.071 g,0.056 g,0,0.071 g,0.106 g,0.217 g,0.020 g,0.065 g,0.071 g,0.086 g,0.076 g,0.020 g,0.051 g,0.125 g,4.97 g,2.0 g,1.91 g,0.97 g,0.00 g,0.94 g,0.00 g,0.00 g,0.00 g,0.28 g,0.130 g,0.034 g,0.031 g,0.00 mg,0.0 g,0.76 g,0.00 mg,0.00 mg,92.07 g
6,6,"Taro leaves, raw",100 g,42,0.7g,0.2g,0,3.00 mg,12.8 mg,126.00 mcg,0.00 mcg,1.513 mg,0.084 mg,0.456 mg,0.209 mg,4825.00 IU,241.00 mcg,0.00 mcg,2895.00 mcg,0.00 mcg,1932.00 mcg,0,0.00 mcg,0.146 mg,52.0 mg,0.00 IU,2.02 mg,2.02 mg,108.6 mcg,107.00 mg,0.270 mg,2.25 mg,45.00 mg,0.714 mg,60.00 mg,648.00 mg,0.9 mcg,0.41 mg,4.98 g,0,0.220 g,0,0.064 g,0,0,0.114 g,0,0.260 g,0.392 g,0.246 g,0.079 g,0.195 g,0,0,0.167 g,0.048 g,0.178 g,0.256 g,6.70 g,3.7 g,3.01 g,0,0,0,0,0,0,0.74 g,0.151 g,0.060 g,0.307 g,0.00 mg,0.0 g,1.92 g,0.00 mg,0.00 mg,85.66 g
7,7,"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g
8,8,"Cheese, camembert",100 g,300,24g,15g,72mg,842.00 mg,15.4 mg,62.00 mcg,0.00 mcg,0.630 mg,1.364 mg,0.488 mg,0.028 mg,820.00 IU,241.00 mcg,0.00 mcg,12.00 mcg,0.00 mcg,0.00 mcg,0,1.30 mcg,0.227 mg,0.0 mg,18.00 IU,0.21 mg,0.21 mg,2.0 mcg,388.00 mg,0.021 mg,0.33 mg,20.00 mg,0.038 mg,347.00 mg,187.00 mg,14.5 mcg,2.38 mg,19.80 g,0.819 g,0.701 g,1.288 g,0.109 g,4.187 g,0.379 g,0.683 g,0,0.968 g,1.840 g,1.766 g,0.565 g,1.105 g,2.346 g,1.114 g,0.717 g,0.307 g,1.145 g,1.279 g,0.46 g,0.0 g,0.46 g,0,0,0,0,0,0,24.26 g,15.259 g,7.023 g,0.724 g,72.00 mg,0.0 g,3.68 g,0.00 mg,0.00 mg,51.80 g
9,9,Vegetarian fillets,100 g,290,18g,2.8g,0,490.00 mg,82.0 mg,102.00 mcg,0.00 mcg,12.000 mg,0,0.900 mg,1.100 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.20 mcg,1.500 mg,0.0 mg,0.00 IU,3.45 mg,3.45 mg,0.0 mcg,95.00 mg,0.925 mg,2.00 mg,23.00 mg,0,450.00 mg,600.00 mg,1.0 mcg,1.40 mg,23.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9.00 g,6.1 g,0.80 g,0,0,0,0,0,0,18.00 g,2.849 g,4.376 g,9.332 g,0.00 mg,0.0 g,5.00 g,0.00 mg,0.00 mg,45.00 g


This is a fairly large DataFrame. The info method shows that it has 77 columns and 8789 entries. It takes up about 39.2 MB of memory.

Note that one of the dtypes is called `object`. This has to do with the fact that for many of the columns,the units are embedded with the values. Therefore, Pandas cannot treat them as numericals. Instead, they are automatically converted into strings, and Pandas treats strings as `object` datatypes.

In [25]:
nutrition.info(verbose = False, memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8789 entries, 0 to 8788
Columns: 77 entries, Unnamed: 0 to water
dtypes: int64(3), object(74)
memory usage: 39.2 MB


## Cleanup: Removing Duplicated index
Let's take our first crack at cleaning data! If you look at your DataFrame, there is an extra column called `Unnamed: 0`, which appears to be a repitition of the index labels. This is NOT an index column, but rather just another column.

In [183]:
nutrition['Unnamed: 0']

KeyError: ignored

We should remove this column as it is not useful. One option is to use the `drop()` method. All one must do is specify the name of the entity to be dropped as well as the `axis`. Remember that DataFrames are two-dimensional, and so the axis on which a property is to be dropped must be specified. After all, the name could refer to either an index label or a column name, and you have to tell Pandas which one it is.

By default, this will return a new DataFrame with the indicated column(s) dropped.

In [182]:
nutrition.drop('Unnamed: 0', axis = 1)

KeyError: ignored

Another option is to designated the unwanted column as the index using the `set_index` method. That works for this example because the actual index and the unwanted column are the same. In practice, you will want to ensure that the column you set as the index is truely appropriate as the index.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

In any case, the column indicated will replace the integer index column.

In [None]:
nutrition.set_index('Unnamed: 0')

A third approach, advocated by the instructor, is to recognize that the `Unnamed: 0` column was almost certainly an index for this dataset, and that we failed to notice that when reading in the data. If we do recognize it, we can let Pandas know in the `read_csv()` call that this column (or any column for that matter) is the index column by using the `index_col` parameter.

In this example, specifiying `index_col=0` tells Pandas that the first column (at index 0) is the index column. This column will be set as the index. If we do not specify this, then Pandas will create a default index column with integers.

In [None]:
nutrition = pd.read_csv(dataurl, index_col='Unnamed: 0')
nutrition.head(10)

## The `sample()` Method

`sample()` is called on a DataFrame and returns back a random record/observation from that DataFrame
* You can also use the `random_state` parameter to "choose" which record(s) to return. This provides a seed for a random number generator. For any given random state, it will return the same record(s) each time
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

In [30]:
nutrition.sample()

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
5579,"Pigeon peas (red gram), with salt, boiled, coo...",100 g,121,0.4g,0.1g,0,241.00 mg,0,111.00 mcg,0.00 mcg,0.781 mg,0.319 mg,0.059 mg,0.146 mg,3.00 IU,0.00 mcg,0,0,0,0,0,0.00 mcg,0.050 mg,0.0 mg,0.00 IU,0,0,0,43.00 mg,0.269 mg,1.11 mg,46.00 mg,0.501 mg,119.00 mg,384.00 mg,2.9 mcg,0.90 mg,6.76 g,0.303 g,0.405 g,0.669 g,0.078 g,1.568 g,0.250 g,0.241 g,0,0.245 g,0.483 g,0.474 g,0.076 g,0.579 g,0.298 g,0.320 g,0.239 g,0.066 g,0.168 g,0.292 g,23.25 g,6.7 g,0,0,0,0,0,0,0,0.38 g,0.083 g,0.003 g,0.205 g,0.00 mg,0,1.06 g,0,0,68.55 g


In [31]:
nutrition.sample(random_state=2)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7652,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,121,2.9g,1.1g,62mg,54.00 mg,64.8 mg,4.00 mcg,0.00 mcg,6.462 mg,0.358 mg,0.236 mg,0.064 mg,6.00 IU,2.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.635 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.34 mg,12.00 mg,0.004 mg,220.00 mg,313.00 mg,22.2 mcg,3.70 mg,23.59 g,1.540 g,1.730 g,2.491 g,0.259 g,4.206 g,1.111 g,0.957 g,0.119 g,1.203 g,2.219 g,2.480 g,0.685 g,1.028 g,1.083 g,1.047 g,1.212 g,0.289 g,0.963 g,1.271 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,2.94 g,1.120 g,1.292 g,0.223 g,62.00 mg,0.0 g,1.12 g,0.00 mg,0.00 mg,72.96 g


You can use a comparison operator to prove that the same random state will return the same records every time.

In [32]:
nutrition.sample(random_state=2) == nutrition.sample(random_state=2)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7652,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True


You can choose how many records to return using the `n` parameter. It defaults to 1


In [33]:
nutrition.sample(n = 5)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
8438,"Infant formula, with DHA and ARA, not reconsti...",100 g,517,28g,14g,8mg,224.00 mg,40.7 mg,76.00 mcg,76.00 mcg,6.870 mg,3.817 mg,0.458 mg,0.305 mg,1528.00 IU,458.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.29 mcg,0.305 mg,45.8 mg,229.00 IU,10.24 mg,10.24 mg,76.3 mcg,534.00 mg,0.382 mg,9.16 mg,38.00 mg,0,382.00 mg,598.00 mg,9.3 mcg,3.82 mg,13.99 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,51.91 g,0.0 g,34.18 g,0,0,0,0,0,0,28.19 g,14.282 g,2.276 g,8.877 g,8.00 mg,0.0 g,4.41 g,0.00 mg,0.00 mg,1.50 g
3554,"KELLOGG'S, Honey Oat, Waffles, NUTRI-GRAIN, EGGO",100 g,268,8.6g,2g,0,544.00 mg,0,54.00 mcg,0,5.700 mg,0,0.490 mg,0.430 mg,1429.00 IU,429.00 mcg,0,0,0,0,0,1.70 mcg,0.570 mg,0,0.00 IU,0.00 mg,0.00 mg,0,184.00 mg,0,6.40 mg,26.00 mg,0,286.00 mg,139.00 mg,0,0.60 mg,6.60 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,44.10 g,3.8 g,10.50 g,0,0,0,0,0,0,8.60 g,2.000 g,2.200 g,3.700 g,0.00 mg,0,0,0,0,37.50 g
1644,"Beans, with franks, canned, baked",100 g,142,6.6g,2.4g,6mg,430.00 mg,34.9 mg,30.00 mcg,0.00 mcg,0.901 mg,0.139 mg,0.056 mg,0.058 mg,87.00 IU,4.00 mcg,0.00 mcg,52.00 mcg,0.00 mcg,13.00 mcg,0,0.34 mcg,0.046 mg,2.3 mg,0.00 IU,0.16 mg,0.16 mg,1.0 mcg,48.00 mg,0.213 mg,1.73 mg,28.00 mg,0.420 mg,104.00 mg,235.00 mg,6.5 mcg,1.87 mg,6.75 g,0.302 g,0.427 g,0.800 g,0.074 g,1.037 g,0.288 g,0.190 g,0,0.297 g,0.534 g,0.472 g,0.105 g,0.349 g,0.290 g,0.357 g,0.279 g,0.077 g,0.190 g,0.345 g,15.39 g,6.9 g,6.53 g,0,0,0,0,0,0,6.57 g,2.352 g,2.830 g,0.836 g,6.00 mg,0.0 g,1.95 g,0.00 mg,0.00 mg,69.34 g
8668,"Beef, braised, cooked, all grades, trimmed to ...",100 g,214,7.7g,2.7g,93mg,44.00 mg,129.5 mg,11.00 mcg,0.00 mcg,6.109 mg,0.669 mg,0.192 mg,0.077 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.88 mcg,0.456 mg,0.0 mg,5.00 IU,0.45 mg,0.45 mg,1.6 mcg,7.00 mg,0.084 mg,2.83 mg,23.00 mg,0.012 mg,214.00 mg,278.00 mg,38.3 mcg,5.82 mg,34.00 g,2.067 g,2.199 g,3.097 g,0.439 g,5.105 g,2.071 g,1.085 g,0.357 g,1.547 g,2.705 g,2.874 g,0.886 g,1.343 g,1.621 g,1.339 g,1.358 g,0.223 g,1.084 g,1.687 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,7.67 g,2.675 g,3.231 g,0.286 g,93.00 mg,0.0 g,1.01 g,0.00 mg,0.00 mg,58.09 g
156,"Spices, ground, mace",100 g,475,32g,9.5g,0,80.00 mg,0,76.00 mcg,0.00 mcg,1.350 mg,0,0.448 mg,0.312 mg,800.00 IU,40.00 mcg,0,0,0,0,0,0.00 mcg,0.160 mg,21.0 mg,0.00 IU,0,0,0,252.00 mg,2.467 mg,13.90 mg,163.00 mg,1.500 mg,110.00 mg,463.00 mg,2.7 mcg,2.30 mg,6.71 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50.50 g,20.2 g,0,0,0,0,0,0,0,32.38 g,9.510 g,11.170 g,4.390 g,0.00 mg,0.0 g,2.23 g,0.00 mg,0.00 mg,8.17 g


You can also use the `frac` parameter to indicate what percentage of the data you want to randomly sample. It cannot be used at the same time as the `n` parameter. Think about why that makes sense.

In [34]:
nutrition.sample(frac = 0.01)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
8062,"Beef, raw, choice, trimmed to 0"" fat, separabl...",100 g,175,11g,4.6g,73mg,81.00 mg,74.3 mg,3.00 mcg,0.00 mcg,3.567 mg,0.740 mg,0.200 mg,0.080 mg,6.00 IU,2.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,3.32 mcg,0.311 mg,0.0 mg,5.00 IU,0.17 mg,0.17 mg,1.5 mcg,11.00 mg,0.079 mg,2.53 mg,21.00 mg,0.014 mg,187.00 mg,313.00 mg,22.3 mcg,8.25 mg,19.38 g,1.120 g,1.303 g,1.784 g,0.206 g,3.156 g,0.863 g,0.639 g,0.098 g,0.848 g,1.604 g,1.743 g,0.565 g,0.755 g,0.798 g,0.761 g,0.878 g,0.222 g,0.687 g,0.896 g,0.29 g,0.0 g,0.00 g,0,0,0,0,0,0,10.70 g,4.550 g,5.723 g,0.427 g,73.00 mg,0.0 g,0.90 g,0.00 mg,0.00 mg,68.73 g
2670,"Fish, raw (Alaska Native), sheefish",100 g,115,2.8g,0.5g,56mg,52.00 mg,108.9 mg,9.00 mcg,0.00 mcg,2.120 mg,0.648 mg,0.126 mg,0.041 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,5.90 mcg,0.162 mg,0.0 mg,0,0.44 mg,0.44 mg,0.0 mcg,140.00 mg,0.148 mg,0.50 mg,25.00 mg,0.037 mg,300.00 mg,390.00 mg,42.3 mcg,0.60 mg,22.25 g,1.400 g,1.310 g,2.070 g,0.200 g,2.950 g,1.500 g,0.520 g,0,0.860 g,1.520 g,1.820 g,0.630 g,0.850 g,1.150 g,0.850 g,0.940 g,0.210 g,0.680 g,1.000 g,0.00 g,0.0 g,0.00 g,0.00 g,0.00 g,0.00 g,0.00 g,0.00 g,0.00 g,2.84 g,0.490 g,1.080 g,0.700 g,56.00 mg,0.0 g,1.58 g,0.00 mg,0.00 mg,74.62 g
5925,"MORNINGSTAR FARMS Mediterranean Chickpea, unpr...",100 g,200,6.5g,0.9g,1mg,357.00 mg,0,28.00 mcg,0.00 mcg,0.400 mg,0,0.200 mg,0.100 mg,0.00 IU,0,0,0,0,0,0,0.00 mcg,0.100 mg,0.0 mg,0.00 IU,0,0,0,94.00 mg,0,1.70 mg,18.00 mg,0,85.00 mg,261.00 mg,0,0.50 mg,15.50 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19.80 g,10.1 g,1.10 g,0,0,0,0,0,0,6.50 g,0.900 g,2.000 g,3.300 g,1.00 mg,0,1.80 g,0,0,56.40 g
1211,"Gravy, dry, instant beef",100 g,369,9.5g,4.9g,11mg,5203.00 mg,47.8 mg,34.00 mcg,20.00 mcg,1.075 mg,1.218 mg,0.514 mg,0.187 mg,8.00 IU,2.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,2.00 mcg,0,0.62 mcg,0.138 mg,0.9 mg,0.00 IU,0.01 mg,0.01 mg,0.4 mcg,141.00 mg,0.146 mg,6.27 mg,41.00 mg,0.154 mg,239.00 mg,450.00 mg,18.0 mcg,0.76 mg,9.80 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61.10 g,4.3 g,23.90 g,0,0,0,0,0,0,9.48 g,4.880 g,3.659 g,0.938 g,11.00 mg,0.0 g,14.97 g,0.00 mg,0.00 mg,4.65 g
2769,"Candies, chocolate-flavor roll, TOOTSIE ROLL",100 g,387,3.3g,1g,2mg,44.00 mg,0,9.00 mcg,3.00 mcg,0.210 mg,0.290 mg,0.070 mg,0.056 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.010 mg,0.0 mg,0,0.65 mg,0.65 mg,9.6 mcg,36.00 mg,0.130 mg,0.80 mg,22.00 mg,0.047 mg,57.00 mg,116.00 mg,2.2 mcg,0.40 mg,1.59 g,0.060 g,0.060 g,0.200 g,0.000 g,0.280 g,0.050 g,0.030 g,0.000 g,0.060 g,0.110 g,0.090 g,0.010 g,0.050 g,0.030 g,0.080 g,0.040 g,0.190 g,0.020 g,0.080 g,87.73 g,0.1 g,56.32 g,0.40 g,0.00 g,7.85 g,2.18 g,6.54 g,39.35 g,3.31 g,0.967 g,1.929 g,0.266 g,2.00 mg,0,0.68 g,7.00 mg,75.00 mg,6.69 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3900,"CAMPBELL'S Soup on the Go, Vegetable Beef Soup",100 g,20,0.3g,0.2g,2mg,305.00 mg,0,0,0,0,0,0,0,131.00 IU,0,0,0,0,0,0,0,0,0.0 mg,0,0,0,0,7.00 mg,0,0.12 mg,0,0,0,0,0,0,0.98 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.28 g,0.3 g,1.64 g,0,0,0,0,0,0,0.33 g,0.164 g,0,0,2.00 mg,0,1.41 g,0,0,94.00 g
8586,"Beef, roasted, cooked, choice, trimmed to 1/8""...",100 g,359,29g,12g,83mg,63.00 mg,0,6.00 mcg,0.00 mcg,3.130 mg,0.240 mg,0.160 mg,0.060 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.85 mcg,0.240 mg,0.0 mg,0,0,0,0,13.00 mg,0.077 mg,2.39 mg,20.00 mg,0.013 mg,181.00 mg,319.00 mg,21.9 mcg,4.86 mg,22.28 g,1.344 g,1.408 g,2.035 g,0.249 g,3.347 g,1.215 g,0.763 g,0,1.002 g,1.761 g,1.853 g,0.570 g,0.870 g,0.984 g,0.852 g,0.973 g,0.249 g,0.748 g,1.083 g,0.00 g,0.0 g,0,0,0,0,0,0,0,29.21 g,11.780 g,12.630 g,1.070 g,83.00 mg,0,1.09 g,0,0,47.18 g
8392,"Beef, grilled, cooked, select, trimmed to 0"" f...",100 g,233,13g,4.3g,91mg,61.00 mg,60.4 mg,7.00 mcg,0.00 mcg,6.509 mg,0.674 mg,0.346 mg,0.078 mg,25.00 IU,8.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.70 mcg,0.515 mg,0.0 mg,5.00 IU,0.07 mg,0.07 mg,1.6 mcg,7.00 mg,0.106 mg,2.34 mg,24.00 mg,0.106 mg,173.00 mg,307.00 mg,34.3 mcg,8.33 mg,30.06 g,1.892 g,2.102 g,3.006 g,0.315 g,5.020 g,1.488 g,1.176 g,0.205 g,1.426 g,2.663 g,2.973 g,0.831 g,1.227 g,1.387 g,1.274 g,1.444 g,0.337 g,1.152 g,1.524 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,12.54 g,4.311 g,5.304 g,0.543 g,91.00 mg,0.0 g,1.08 g,0.00 mg,0.00 mg,57.16 g
7052,"Babyfood, baked whole grain corn snack, LIL CR...",100 g,503,29g,4.7g,13mg,71.00 mg,16.6 mg,19.00 mcg,0.00 mcg,1.291 mg,0.493 mg,0.099 mg,0.169 mg,265.00 IU,39.00 mcg,43.00 mcg,75.00 mcg,0.00 mcg,914.00 mcg,0,0.10 mcg,0.258 mg,0.0 mg,3.00 IU,8.57 mg,8.57 mg,9.8 mcg,285.00 mg,0.160 mg,6.40 mg,69.00 mg,0.312 mg,1044.00 mg,214.00 mg,12.1 mcg,3.57 mg,0.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61.59 g,0.0 g,0.00 g,0,0,0,0,0,0,28.57 g,4.669 g,14.682 g,7.698 g,13.00 mg,0.0 g,7.24 g,0.00 mg,0.00 mg,2.60 g


In [35]:
nutrition.sample(frac = 0.01).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 88 entries, 2935 to 3501
Data columns (total 76 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   name                         88 non-null     object
 1   serving_size                 88 non-null     object
 2   calories                     88 non-null     int64 
 3   total_fat                    88 non-null     object
 4   saturated_fat                70 non-null     object
 5   cholesterol                  88 non-null     object
 6   sodium                       88 non-null     object
 7   choline                      88 non-null     object
 8   folate                       88 non-null     object
 9   folic_acid                   88 non-null     object
 10  niacin                       88 non-null     object
 11  pantothenic_acid             88 non-null     object
 12  riboflavin                   88 non-null     object
 13  thiamin                      88 

## BONUS - Sampling with Replacement or Weights

The sampling method can get even fancier.
* **Sampling with replacement** refers to the act of placing a record back into the population after it is selected, such that the probability of selecting that item in subsequent selections is unchanged.
* This makes it possible to pick the exact same record multiple times
* This is also known as **bootstrapping** in statistics


Within the `sample()` method, we can use the `replace` parameter to determine whether we sample with replacement. By setting `replace` to `True`, there is a chance to select the same record more than once.

In [36]:
nutrition.sample(n=3, replace = True)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
882,"Gravy, dry, unspecified type",100 g,344,8g,2.9g,4mg,5730.00 mg,0,31.00 mcg,17.00 mcg,3.700 mg,0.100 mg,0.432 mg,0.200 mg,0.00 IU,0.00 mcg,0,0,0,0,0,0.70 mcg,0.100 mg,7.0 mg,0,0,0,0,150.00 mg,0.100 mg,1.00 mg,45.00 mg,0.400 mg,203.00 mg,262.00 mg,6.1 mcg,1.40 mg,13.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,58.00 g,0,0,0,0,0,0,0,0,8.00 g,2.860 g,3.150 g,1.620 g,4.00 mg,0,17.00 g,0,0,4.00 g
1576,"KEEBLER, Frosted Cookies, ANIMALS",100 g,506,24g,16g,0,257.00 mg,0,59.00 mcg,0,1.700 mg,0,0.150 mg,0.280 mg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5.00 mg,96.00 mg,0,0,3.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,70.00 g,0.9 g,41.10 g,0,0,0,0,0,0,24.10 g,16.200 g,2.400 g,4.500 g,0.00 mg,0,0,0,0,2.00 g
4263,"Fish, drained solids with bone, canned, chum, ...",100 g,141,5.5g,1.5g,39mg,391.00 mg,85.0 mg,20.00 mcg,0.00 mcg,7.000 mg,0.560 mg,0.160 mg,0.020 mg,60.00 IU,18.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.40 mcg,0.380 mg,0.0 mg,386.00 IU,1.60 mg,1.60 mg,0.1 mcg,249.00 mg,0.100 mg,0.70 mg,30.00 mg,0.020 mg,354.00 mg,300.00 mg,43.3 mcg,1.00 mg,21.43 g,1.296 g,1.282 g,2.195 g,0.230 g,3.199 g,1.029 g,0.631 g,0,0.988 g,1.742 g,1.968 g,0.634 g,0.837 g,0.758 g,0.874 g,0.940 g,0.240 g,0.724 g,1.104 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,5.50 g,1.486 g,1.919 g,1.517 g,39.00 mg,0.0 g,2.47 g,0.00 mg,0.00 mg,70.77 g


**Weighted sampling** refers to weighting our records for sampling. The higher the weight, the higher the likelihood of being selected.

Within the `sample()` method we do this using the `weights` parameter.
* If `weights` is not used, all reocrds will have equal probability weighting - that is, there will be equal likelihood of selecting any given record.
* This becomes useful when the indices have meaningful labels that would justify giving more weight to some records and less weight to others.

As an example, we start by creating a Series of weights, indexed by the label of the record that we want to weight in the main DataFrame

In [37]:
weights = pd.Series(data = [10, 10, 10, 1, 2], index=[7, 17, 29, 5, 6])
weights

7     10
17    10
29    10
5      1
6      2
dtype: int64

Now let's use these weights for our sampling. Notice that the higher weighted indices are more likely to be selected.

In [38]:
nutrition.sample(n=3, weights= weights)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
17,"Peppers, raw, jalapeno",100 g,29,0.4g,0.1g,0,3.00 mg,7.5 mg,27.00 mcg,0.00 mcg,1.280 mg,0.315 mg,0.070 mg,0.040 mg,1078.00 IU,54.00 mcg,67.00 mcg,561.00 mcg,105.00 mcg,861.00 mcg,0,0.00 mcg,0.419 mg,118.6 mg,0.00 IU,3.58 mg,3.58 mg,18.5 mcg,12.00 mg,0.046 mg,0.25 mg,15.00 mg,0.097 mg,26.00 mg,248.00 mg,0.4 mcg,0.14 mg,0.91 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6.50 g,2.8 g,4.12 g,2.63 g,0.00 g,1.48 g,0.00 g,0.00 g,0.00 g,0.37 g,0.092 g,0.029 g,0.112 g,0.00 mg,0.0 g,0.53 g,0.00 mg,0.00 mg,91.69 g
29,"Nuts, dried, pine nuts",100 g,673,68g,4.9g,0,2.00 mg,55.8 mg,34.00 mcg,0.00 mcg,4.387 mg,0.313 mg,0.227 mg,0.364 mg,29.00 IU,1.00 mcg,0.00 mcg,17.00 mcg,0.00 mcg,9.00 mcg,0,0.00 mcg,0.094 mg,0.8 mg,0.00 IU,9.33 mg,9.33 mg,53.9 mcg,16.00 mg,1.324 mg,5.53 mg,251.00 mg,8.802 mg,575.00 mg,597.00 mg,0.7 mcg,6.45 mg,13.69 g,0.684 g,2.413 g,1.303 g,0.289 g,2.926 g,0.691 g,0.341 g,0,0.542 g,0.991 g,0.540 g,0.259 g,0.524 g,0.673 g,0.835 g,0.370 g,0.107 g,0.509 g,0.687 g,13.08 g,3.7 g,3.59 g,0.07 g,0.00 g,0.07 g,0.00 g,0.00 g,3.45 g,68.37 g,4.899 g,18.764 g,34.071 g,0.00 mg,0.0 g,2.59 g,0.00 mg,0.00 mg,2.28 g
7,"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g


## BONUS - How Are "Random Numbers" Generated?


How does Pandas, or any other RNGs for that matter, how does it decide what number to return to us? When we observe random phenomena in the real world, we are witness "natural attributes", or **true randomness**. That is, **what we observe now tells us nothing about what will come next**. You will get no predictiveness out of this.

In practice, it is difficult to sample a truly random attribute.

At random.org, they use atmospheric measurements and use that stream of randomness to generate numbers, streams, names, etc. It is true randomness that is supported by (as far as we can tell) a truly random natural phenomenon.

By contrast, computer programs create **pseudorandomness**. Computers are generally very algorithmic, following a set of rules for generating numbers. They cannot on their own generate numbers. For this reason, computer scientists have generated pseudo-random number generators (**PRNG**s). They start with a piece of input, and then from there they predictively generate a sequence of numbers. The catch is that these generators behave they eventually repeat themselves in a non-random fashion.

Python uses the Mersenne Twister, one of the most widely-used PRNGs.

Circling back to our `sample()` method in Pandas, by seeding a `random_state`, we lock down the starting input. Because the PRNG is deterministic, the number or number sequence that is generated with that particular seed state will always be the same.

Some useful links on random number generators:
* https://www.random.org/randomness/
* https://en.wikipedia.org/wiki/Mersenne_Twister
* https://www.random.org/

## DataFrame Axes

Recall that DataFrames are 2-dimensional. These two dimensions are typically the *index labels* and the *columnms*. Thus, the number of attributes that you would need to specify to select as specific observation is two - we need to provide a row label and a column label. Think of them as coordinates as sorts.

The `axes` attribute returns a list representing the axes of the dataframe. The list has two entries
1. A Pandas Index object representing the index labels
2. A Pandas Index object representing the column labels
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.axes.html

In [39]:
nutrition.axes

[Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
             ...
             8779, 8780, 8781, 8782, 8783, 8784, 8785, 8786, 8787, 8788],
            dtype='int64', length=8789),
 Index(['name', 'serving_size', 'calories', 'total_fat', 'saturated_fat',
        'cholesterol', 'sodium', 'choline', 'folate', 'folic_acid', 'niacin',
        'pantothenic_acid', 'riboflavin', 'thiamin', 'vitamin_a',
        'vitamin_a_rae', 'carotene_alpha', 'carotene_beta',
        'cryptoxanthin_beta', 'lutein_zeaxanthin', 'lucopene', 'vitamin_b12',
        'vitamin_b6', 'vitamin_c', 'vitamin_d', 'vitamin_e', 'tocopherol_alpha',
        'vitamin_k', 'calcium', 'copper', 'irom', 'magnesium', 'manganese',
        'phosphorous', 'potassium', 'selenium', 'zink', 'protein', 'alanine',
        'arginine', 'aspartic_acid', 'cystine', 'glutamic_acid', 'glycine',
        'histidine', 'hydroxyproline', 'isoleucine', 'leucine', 'lysine',
        'methionine', 'phenylalanine', 'proline', 'ser

Looking at them in isolation. We can, for example, get the third index label by accessing the first element of the list (the Index labels Pandas object) and then accessing the third item in that list as follows:

In [40]:
nutrition.axes[0][3]

3

A more direct way of doing this would be the access the index directly using the `index` attribute followed by bracket notation:

In [41]:
nutrition.index[3]

3

We can also, for example, the the 69th column label by accessing the second item in the `axes` return (the column labels Pandas object) and then accessing the 69th item in that list:

In [42]:
nutrition.axes[1][69]

'polyunsaturated_fatty_acids'

A more direct way to do this would be to access the columns directly using the `columns` attribute followed by bracket notation.

In [43]:
nutrition.columns[69]

'polyunsaturated_fatty_acids'

Generally speaking, the `axis` parameter shows us in MANY Pandas DataFrame methods. In any method that supports this parameter, `0` refers to **rows** and `1` refers to **columns**
* These methods also support `row` and `column` in place of `0` and `1`, though the instructor prefers the integer-based references

## Changing the Index of a DataFrame

By default, DataFrames have integer-based indices. The datatype is `Int64Index`. 



In [44]:
nutrition.index

Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            8779, 8780, 8781, 8782, 8783, 8784, 8785, 8786, 8787, 8788],
           dtype='int64', length=8789)

`RangeIndex` which we saw earlier is a special case of the `Int64Index`. It is a more optimized version of `Int64Index` in which the entire sequence is defined by explicit start and stop indices and steps.

In [45]:
pd.RangeIndex(start=0, stop=8789, step=1)

RangeIndex(start=0, stop=8789, step=1)

Let's assign this as the index of our DataFrame. Notice the change in the datatype.

In [46]:
print(type(nutrition.index))
nutrition.index = pd.RangeIndex(start=0, stop=8789, step=1)
print(type(nutrition.index))

<class 'pandas.core.indexes.numeric.Int64Index'>
<class 'pandas.core.indexes.range.RangeIndex'>


Now, what if we wanted to change the index from a series of integers to something else? One solution is to use the `set_index()` method on the DataFrame. We've seen this before.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

Here, we will use the name of the food as the index. Remember that `set_index()` returns a copy. To modify the existing DataFrame, set `inplace = True`

Also keep in mind that the `drop` parameter in `set_index` defaults to `True`, and so the column that is set as the new index is removed as a column in the DataFrame

In [47]:
nutrition.set_index('name')

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Beef, raw, all grades, trimmed to 0"" fat, separable lean and fat, boneless, top round roast, round",100 g,125,3.5g,1.4g,62mg,54.00 mg,64.5 mg,4.00 mcg,0.00 mcg,6.422 mg,0.356 mg,0.234 mg,0.063 mg,11.00 IU,3.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.631 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.33 mg,12.00 mg,0.004 mg,219.00 mg,311.00 mg,22.1 mcg,3.67 mg,23.45 g,1.454 g,1.597 g,2.285 g,0.239 g,3.834 g,1.154 g,0.879 g,0.160 g,1.092 g,2.021 g,2.246 g,0.635 g,0.941 g,1.052 g,0.966 g,1.105 g,0.262 g,0.874 g,1.172 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.50 g,1.353 g,1.554 g,0.244 g,62.00 mg,0.0 g,1.11 g,0.00 mg,0.00 mg,72.51 g
"Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand",100 g,206,8.9g,3.9g,109mg,50.00 mg,0,0.00 mcg,0.00 mcg,7.680 mg,0.580 mg,0.500 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.95 mcg,0.140 mg,0.0 mg,0,0.19 mg,0.19 mg,0,13.00 mg,0.114 mg,2.35 mg,22.00 mg,0.029 mg,246.00 mg,188.00 mg,2.0 mcg,4.30 mg,29.59 g,1.780 g,1.758 g,2.605 g,0.353 g,4.294 g,1.445 g,0.937 g,0,1.428 g,2.302 g,2.613 g,0.759 g,1.205 g,1.241 g,1.100 g,1.267 g,0.346 g,0.995 g,1.597 g,0.00 g,0.0 g,0,0,0,0,0,0,0,8.86 g,3.860 g,3.480 g,0.520 g,109.00 mg,0,1.60 g,0,0,59.95 g
"Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand",100 g,277,23g,12g,78mg,39.00 mg,0,1.00 mcg,0.00 mcg,6.550 mg,0.520 mg,0.320 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.42 mcg,0.110 mg,0.0 mg,0,0.21 mg,0.21 mg,0,13.00 mg,0.083 mg,1.49 mg,15.00 mg,0.018 mg,168.00 mg,136.00 mg,1.3 mcg,2.39 mg,16.74 g,1.007 g,0.994 g,1.473 g,0.200 g,2.429 g,0.818 g,0.530 g,0,0.808 g,1.302 g,1.478 g,0.430 g,0.681 g,0.702 g,0.622 g,0.716 g,0.196 g,0.563 g,0.903 g,0.00 g,0.0 g,0,0,0,0,0,0,0,22.74 g,11.570 g,8.720 g,0.980 g,78.00 mg,0,0.92 g,0,0,59.80 g
"Beef, raw, all grades, trimmed to 0"" fat, separable lean only, boneless, eye of round roast, round",100 g,121,3g,1.1g,60mg,53.00 mg,64.2 mg,4.00 mcg,0.00 mcg,6.720 mg,0.355 mg,0.184 mg,0.063 mg,4.00 IU,1.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.84 mcg,0.644 mg,0.0 mg,1.00 IU,0.24 mg,0.24 mg,1.5 mcg,13.00 mg,0.042 mg,1.45 mg,12.00 mg,0.001 mg,222.00 mg,319.00 mg,22.6 mcg,3.42 mg,23.37 g,1.525 g,1.714 g,2.468 g,0.256 g,4.167 g,1.101 g,0.948 g,0.118 g,1.192 g,2.198 g,2.457 g,0.679 g,1.018 g,1.073 g,1.037 g,1.201 g,0.287 g,0.954 g,1.259 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.04 g,1.086 g,1.266 g,0.233 g,60.00 mg,0.0 g,1.10 g,0.00 mg,0.00 mg,73.43 g


The `append` parameter in `set_index()` defaults to `False`. By setting it to `True`, the index will be appended in addition to the existing index, creating a multi-index DataFrame. We will revisit this concept later on.

In [48]:
multi_ind = nutrition.set_index('folic_acid', drop=False, append = True).set_index('name', drop=False, append = True).head(5)
multi_ind

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
Unnamed: 0_level_1,folic_acid,name,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1
0,0.00 mcg,Cornstarch,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,0.00 mcg,"Nuts, pecans","Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,0.00 mcg,"Eggplant, raw","Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,0,"Teff, uncooked","Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,0.00 mcg,"Sherbet, orange","Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


You can run the `axes` attribute on a multi-dimensional DataFrame and you will see that the index axis (axis 0) is now a `MultiIndex` datatype.

In [49]:
multi_ind.axes

[MultiIndex([(0, '0.00 mcg',      'Cornstarch'),
             (1, '0.00 mcg',    'Nuts, pecans'),
             (2, '0.00 mcg',   'Eggplant, raw'),
             (3,        '0',  'Teff, uncooked'),
             (4, '0.00 mcg', 'Sherbet, orange')],
            names=[None, 'folic_acid', 'name']),
 Index(['name', 'serving_size', 'calories', 'total_fat', 'saturated_fat',
        'cholesterol', 'sodium', 'choline', 'folate', 'folic_acid', 'niacin',
        'pantothenic_acid', 'riboflavin', 'thiamin', 'vitamin_a',
        'vitamin_a_rae', 'carotene_alpha', 'carotene_beta',
        'cryptoxanthin_beta', 'lutein_zeaxanthin', 'lucopene', 'vitamin_b12',
        'vitamin_b6', 'vitamin_c', 'vitamin_d', 'vitamin_e', 'tocopherol_alpha',
        'vitamin_k', 'calcium', 'copper', 'irom', 'magnesium', 'manganese',
        'phosphorous', 'potassium', 'selenium', 'zink', 'protein', 'alanine',
        'arginine', 'aspartic_acid', 'cystine', 'glutamic_acid', 'glycine',
        'histidine', 'hydroxyproline',

Lastly, the `set_index` method has a `verify_integrity` parameter. When set to `True`, Pandas will check whether the newly-created Index contains all unique values. If it does not, the method will throw an error.

To illustrate, let's use the `value_counts()` method to determine whether any foods share the same calorie count. Sure enough they do.

In [50]:
nutrition.calories.value_counts()

884    78
47     45
56     43
0      39
63     38
       ..
593     1
657     1
665     1
673     1
727     1
Name: calories, Length: 671, dtype: int64

Now let's try to set *calories* as the index.

In [51]:
# nutrition.set_index('calories', verify_integrity=True)

Although Pandas allows duplicates in the index, the instructor recommends *against* utilizing this feature. More often than not, it will lead to poor performance and errors down the road. For example, accessing observations by index label is no longer as useful.

## Extracting from DataFrames by Label

Many of the same methods that we explored for Series also apply to DataFrames! Even better, since DataFrames have two dimensions, we have even more power and options to slice and extract, since we can do so for both rows and columns.

Let's start with the `loc` attribute, which we saw many times in Series.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Let's first start by setting the `name` column as the index.


In [52]:
nutrition.head()

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


In [53]:
nutrition.set_index('name', inplace=True)


In [54]:
nutrition.head()

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


Now use `loc` to access a particular entry by index label (i.e. column name).

In [55]:
nutrition.loc['Eggplant, raw']

serving_size       100 g
calories              25
total_fat           0.2g
saturated_fat        NaN
cholesterol            0
                  ...   
alcohol            0.0 g
ash               0.66 g
caffeine         0.00 mg
theobromine      0.00 mg
water            92.30 g
Name: Eggplant, raw, Length: 75, dtype: object

Note that what we get back is actually a Pandas Series, with the indices as the column names and the values as the value of those columns.

In [56]:
type(nutrition.loc['Eggplant, raw'])

pandas.core.series.Series

What if we want a **specific** entry using `loc`? That can be done by simply chaining on another set of square brackets that captures the index label (column name) of the property you want!

In [57]:
nutrition.loc['Eggplant, raw']['calories']

25

Alternatively, the `loc` attribute also allows you to access a specific entry all in one go. It accepts two arguments within the square brackets, and these can be thought of as row and column

`.loc[name_of_row, name_of_column]`

In [58]:
nutrition.loc['Eggplant, raw', 'calories']

25

We can also get slices of DataFrames by getting a little creative with `loc`. In this case, we have provided a range of labels in two dimensions, and so Pandas will return a subset of the DataFrame spanning those dimensions.

In [59]:
nutrition.loc['Eggplant, raw':'Sherbet, orange', 'calories':'cholesterol']

Unnamed: 0_level_0,calories,total_fat,saturated_fat,cholesterol
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Eggplant, raw",25,0.2g,,0
"Teff, uncooked",367,2.4g,0.4g,0
"Sherbet, orange",144,2g,1.2g,1mg


Another indexing technique is to pass in a **list** that specifies the rows and columns that you are interested in. The first list should contain the row labels that you want, while the second list should contain the column labels that you want.

This is VERY powerful. If you want just a few entries from your DataFrame and only specific columns, this is the way to do it.

In [60]:
nutrition.loc[
              ['Raspberries, raw'],
              ['protein', 'vitamin_b6']
]

Unnamed: 0_level_0,protein,vitamin_b6
name,Unnamed: 1_level_1,Unnamed: 2_level_1
"Raspberries, raw",1.20 g,0.055 mg


In [61]:
nutrition.loc[
              ['Raspberries, raw','Blackberries, raw'],
              ['protein', 'vitamin_b6', 'water']
]

Unnamed: 0_level_0,protein,vitamin_b6,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Raspberries, raw",1.20 g,0.055 mg,85.75 g
"Blackberries, raw",1.39 g,0.030 mg,88.15 g


## DataFrame Extraction by Position

We previously used `iloc` to extract from Series by position. We can do the same thing with DataFrames.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

In [62]:
nutrition.head(10)

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
"Cauliflower, raw",100 g,25,0.3g,0.1g,0,30.00 mg,44.3 mg,57.00 mcg,0.00 mcg,0.507 mg,0.667 mg,0.060 mg,0.050 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,1.00 mcg,0,0.00 mcg,0.184 mg,48.2 mg,0.00 IU,0.08 mg,0.08 mg,15.5 mcg,22.00 mg,0.039 mg,0.42 mg,15.00 mg,0.155 mg,44.00 mg,299.00 mg,0.6 mcg,0.27 mg,1.92 g,0.116 g,0.086 g,0.177 g,0.020 g,0.257 g,0.071 g,0.056 g,0,0.071 g,0.106 g,0.217 g,0.020 g,0.065 g,0.071 g,0.086 g,0.076 g,0.020 g,0.051 g,0.125 g,4.97 g,2.0 g,1.91 g,0.97 g,0.00 g,0.94 g,0.00 g,0.00 g,0.00 g,0.28 g,0.130 g,0.034 g,0.031 g,0.00 mg,0.0 g,0.76 g,0.00 mg,0.00 mg,92.07 g
"Taro leaves, raw",100 g,42,0.7g,0.2g,0,3.00 mg,12.8 mg,126.00 mcg,0.00 mcg,1.513 mg,0.084 mg,0.456 mg,0.209 mg,4825.00 IU,241.00 mcg,0.00 mcg,2895.00 mcg,0.00 mcg,1932.00 mcg,0,0.00 mcg,0.146 mg,52.0 mg,0.00 IU,2.02 mg,2.02 mg,108.6 mcg,107.00 mg,0.270 mg,2.25 mg,45.00 mg,0.714 mg,60.00 mg,648.00 mg,0.9 mcg,0.41 mg,4.98 g,0,0.220 g,0,0.064 g,0,0,0.114 g,0,0.260 g,0.392 g,0.246 g,0.079 g,0.195 g,0,0,0.167 g,0.048 g,0.178 g,0.256 g,6.70 g,3.7 g,3.01 g,0,0,0,0,0,0,0.74 g,0.151 g,0.060 g,0.307 g,0.00 mg,0.0 g,1.92 g,0.00 mg,0.00 mg,85.66 g
"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g
"Cheese, camembert",100 g,300,24g,15g,72mg,842.00 mg,15.4 mg,62.00 mcg,0.00 mcg,0.630 mg,1.364 mg,0.488 mg,0.028 mg,820.00 IU,241.00 mcg,0.00 mcg,12.00 mcg,0.00 mcg,0.00 mcg,0,1.30 mcg,0.227 mg,0.0 mg,18.00 IU,0.21 mg,0.21 mg,2.0 mcg,388.00 mg,0.021 mg,0.33 mg,20.00 mg,0.038 mg,347.00 mg,187.00 mg,14.5 mcg,2.38 mg,19.80 g,0.819 g,0.701 g,1.288 g,0.109 g,4.187 g,0.379 g,0.683 g,0,0.968 g,1.840 g,1.766 g,0.565 g,1.105 g,2.346 g,1.114 g,0.717 g,0.307 g,1.145 g,1.279 g,0.46 g,0.0 g,0.46 g,0,0,0,0,0,0,24.26 g,15.259 g,7.023 g,0.724 g,72.00 mg,0.0 g,3.68 g,0.00 mg,0.00 mg,51.80 g
Vegetarian fillets,100 g,290,18g,2.8g,0,490.00 mg,82.0 mg,102.00 mcg,0.00 mcg,12.000 mg,0,0.900 mg,1.100 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.20 mcg,1.500 mg,0.0 mg,0.00 IU,3.45 mg,3.45 mg,0.0 mcg,95.00 mg,0.925 mg,2.00 mg,23.00 mg,0,450.00 mg,600.00 mg,1.0 mcg,1.40 mg,23.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9.00 g,6.1 g,0.80 g,0,0,0,0,0,0,18.00 g,2.849 g,4.376 g,9.332 g,0.00 mg,0.0 g,5.00 g,0.00 mg,0.00 mg,45.00 g


Just like in Series, `iloc` indexing is zero-based for DataFrames. In the example below, we are indicating that we want the fourth row (indexed at 3) and all columns for that row.

In [63]:
nutrition.iloc[3]

serving_size      100 g
calories            367
total_fat          2.4g
saturated_fat      0.4g
cholesterol           0
                  ...  
alcohol               0
ash              2.37 g
caffeine              0
theobromine           0
water            8.82 g
Name: Teff, uncooked, Length: 75, dtype: object

We can grab multiple rows as well! To do this, we can pass in lists of row and column locations that we want. You can pass in either rows or columns, or both.

In [64]:
nutrition.iloc[
               [4,6,9]
]

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
"Taro leaves, raw",100 g,42,0.7g,0.2g,0,3.00 mg,12.8 mg,126.00 mcg,0.00 mcg,1.513 mg,0.084 mg,0.456 mg,0.209 mg,4825.00 IU,241.00 mcg,0.00 mcg,2895.00 mcg,0.00 mcg,1932.00 mcg,0,0.00 mcg,0.146 mg,52.0 mg,0.00 IU,2.02 mg,2.02 mg,108.6 mcg,107.00 mg,0.270 mg,2.25 mg,45.00 mg,0.714 mg,60.00 mg,648.00 mg,0.9 mcg,0.41 mg,4.98 g,0,0.220 g,0,0.064 g,0,0,0.114 g,0,0.260 g,0.392 g,0.246 g,0.079 g,0.195 g,0,0,0.167 g,0.048 g,0.178 g,0.256 g,6.70 g,3.7 g,3.01 g,0,0,0,0,0,0,0.74 g,0.151 g,0.060 g,0.307 g,0.00 mg,0.0 g,1.92 g,0.00 mg,0.00 mg,85.66 g
Vegetarian fillets,100 g,290,18g,2.8g,0,490.00 mg,82.0 mg,102.00 mcg,0.00 mcg,12.000 mg,0,0.900 mg,1.100 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.20 mcg,1.500 mg,0.0 mg,0.00 IU,3.45 mg,3.45 mg,0.0 mcg,95.00 mg,0.925 mg,2.00 mg,23.00 mg,0,450.00 mg,600.00 mg,1.0 mcg,1.40 mg,23.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9.00 g,6.1 g,0.80 g,0,0,0,0,0,0,18.00 g,2.849 g,4.376 g,9.332 g,0.00 mg,0.0 g,5.00 g,0.00 mg,0.00 mg,45.00 g


Example of grabbing just one column. But remember that we can't use labels for `iloc`, so we need to know which integer position the column is at.

In [65]:
nutrition.iloc[
               [4,6,9],
               [2,3,4]
]

Unnamed: 0_level_0,total_fat,saturated_fat,cholesterol
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Sherbet, orange",2g,1.2g,1mg
"Taro leaves, raw",0.7g,0.2g,0
Vegetarian fillets,18g,2.8g,0


Alternative syntax:

In [66]:
nutrition.iloc[
               [4,6,9],
               2:5
]

Unnamed: 0_level_0,total_fat,saturated_fat,cholesterol
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Sherbet, orange",2g,1.2g,1mg
"Taro leaves, raw",0.7g,0.2g,0
Vegetarian fillets,18g,2.8g,0


Both the `loc` and `iloc` methods support boolean masks as well. Remember that when we used this for Series, we had to provide a Series of booleans that was exactly equal in size to the Series we are filtering.

Now that we're working with two-dimensional DataFrames, we need to provide *2 lists* of booleans. The first must be the same length as the number of rows, while the second must be the same length as the number of columns.

In this example, we'll create two lists of booleans such that we only grab the even rows and columns. As such, our new DataFrame will be about 1/4 the size of our old one.

In [67]:
evens_nutrition = nutrition.iloc[
               [True if i%2 == 0 else False for i in range(8789)],
               [True if i%2 == 0 else False for i in range(75)]
]
evens_nutrition

Unnamed: 0_level_0,serving_size,total_fat,cholesterol,choline,folic_acid,pantothenic_acid,thiamin,vitamin_a_rae,carotene_beta,lutein_zeaxanthin,vitamin_b12,vitamin_c,vitamin_e,vitamin_k,copper,magnesium,phosphorous,selenium,protein,arginine,cystine,glycine,hydroxyproline,leucine,methionine,proline,threonine,tyrosine,carbohydrate,sugars,galactose,lactose,sucrose,saturated_fatty_acids,polyunsaturated_fatty_acids,alcohol,caffeine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1
Cornstarch,100 g,0.1g,0,0.4 mg,0.00 mcg,0.000 mg,0.000 mg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.0 mg,0.00 mg,0.0 mcg,0.050 mg,3.00 mg,13.00 mg,2.8 mcg,0.26 g,0.012 g,0.006 g,0.009 g,0,0.036 g,0.006 g,0.024 g,0.009 g,0.010 g,91.27 g,0.00 g,0,0,0,0.009 g,0.025 g,0.0 g,0.00 mg,8.32 g
"Eggplant, raw",100 g,0.2g,0,6.9 mg,0.00 mcg,0.281 mg,0.039 mg,1.00 mcg,14.00 mcg,36.00 mcg,0.00 mcg,2.2 mg,0.30 mg,3.5 mcg,0.081 mg,14.00 mg,24.00 mg,0.3 mcg,0.98 g,0.057 g,0.006 g,0.041 g,0,0.064 g,0.011 g,0.043 g,0.037 g,0.027 g,5.88 g,3.53 g,0,0,0.26 g,0.034 g,0.076 g,0.0 g,0.00 mg,92.30 g
"Sherbet, orange",100 g,2g,1mg,7.7 mg,0.00 mcg,0.224 mg,0.027 mg,12.00 mcg,1.00 mcg,7.00 mcg,0.13 mcg,2.3 mg,0.01 mg,0.0 mcg,0.028 mg,8.00 mg,40.00 mg,1.5 mcg,1.10 g,0,0,0,0,0,0,0,0,0,30.40 g,24.32 g,0,0,0,1.160 g,0.080 g,0.0 g,0.00 mg,66.10 g
"Taro leaves, raw",100 g,0.7g,0,12.8 mg,0.00 mcg,0.084 mg,0.209 mg,241.00 mcg,2895.00 mcg,1932.00 mcg,0.00 mcg,52.0 mg,2.02 mg,108.6 mcg,0.270 mg,45.00 mg,60.00 mg,0.9 mcg,4.98 g,0.220 g,0.064 g,0,0,0.392 g,0.079 g,0,0.167 g,0.178 g,6.70 g,3.01 g,0,0,0,0.151 g,0.307 g,0.0 g,0.00 mg,85.66 g
"Cheese, camembert",100 g,24g,72mg,15.4 mg,0.00 mcg,1.364 mg,0.028 mg,241.00 mcg,12.00 mcg,0.00 mcg,1.30 mcg,0.0 mg,0.21 mg,2.0 mcg,0.021 mg,20.00 mg,347.00 mg,14.5 mcg,19.80 g,0.701 g,0.109 g,0.379 g,0,1.840 g,0.565 g,2.346 g,0.717 g,1.145 g,0.46 g,0.46 g,0,0,0,15.259 g,0.724 g,0.0 g,0.00 mg,51.80 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Beef, raw, select, trimmed to 1/8"" fat, separable lean only, lip-on, boneless, rib eye steak/roast",100 g,6.4g,70mg,49.4 mg,0.00 mcg,0.530 mg,0.100 mg,2.00 mcg,0.00 mcg,0.00 mcg,1.75 mcg,0.0 mg,0.10 mg,1.5 mcg,0.084 mg,23.00 mg,160.00 mg,27.0 mcg,22.55 g,1.810 g,0.271 g,1.154 g,0.114 g,2.318 g,0.712 g,1.142 g,1.255 g,1.007 g,0.00 g,0.00 g,0,0,0,2.313 g,0.396 g,0.0 g,0.00 mg,70.89 g
"Oil, uses similar to 95 degree hard butter, confection fat, palm kernel (hydrogenated), industrial",100 g,100g,0,0.2 mg,0.00 mcg,0.000 mg,0.000 mg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.0 mg,3.81 mg,24.7 mcg,0.000 mg,0.00 mg,0.00 mg,0.0 mcg,0.00 g,0.000 g,0.000 g,0.000 g,0,0.000 g,0.000 g,0.000 g,0.000 g,0.000 g,0.00 g,0.00 g,0,0,0,93.701 g,0.000 g,0.0 g,0.00 mg,0.05 g
"Beef, raw, all grades, trimmed to 0"" fat, separable lean and fat, boneless, top round roast, round",100 g,3.5g,62mg,64.5 mg,0.00 mcg,0.356 mg,0.063 mg,3.00 mcg,0.00 mcg,0.00 mcg,1.64 mcg,0.0 mg,0.23 mg,1.5 mcg,0.048 mg,12.00 mg,219.00 mg,22.1 mcg,23.45 g,1.597 g,0.239 g,1.154 g,0.160 g,2.021 g,0.635 g,1.052 g,1.105 g,0.874 g,0.00 g,0.00 g,0,0,0,1.353 g,0.244 g,0.0 g,0.00 mg,72.51 g
"Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand",100 g,23g,78mg,0,0.00 mcg,0.520 mg,0.130 mg,0.00 mcg,0,0,2.42 mcg,0.0 mg,0.21 mg,0,0.083 mg,15.00 mg,168.00 mg,1.3 mcg,16.74 g,0.994 g,0.200 g,0.818 g,0,1.302 g,0.430 g,0.702 g,0.716 g,0.563 g,0.00 g,0,0,0,0,11.570 g,0.980 g,0,0,59.80 g


In [68]:
evens_nutrition.shape

(4395, 38)

What if we want to extract just one ONE SPECIFIC VALUE using `iloc`? Easy enough, just identify the coordinate for that value, then pass the coordinate to `iloc`.
* Pandas also has some built-in attributes that perform single-value extraction much more quickly than `iloc`. We'll cover these in the next lecture.

In [69]:
nutrition.iloc[9,1]

290

## Single Value Access With `.at` and `.iat`

Pandas offers two highly efficient alternative indexing attributes to extract single values, called `.at` and `.iat`

`.at` is an alternative to `.loc`, and similarly uses labeled-based search. It only works for accessing single values. You cannot pass in lists of indices or column labels.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html

In [70]:
nutrition.at['Nuts, pecans', 'calories']

691

In [71]:
nutrition.loc['Nuts, pecans', 'calories']

691

Similarly, we use `.iat` as an alternative to `.iloc`.

In [72]:
nutrition.iat[1,1]

691

Why use `.at` or `.iat` instead of `.loc` and `.iloc`, which can do the exact same thing? The answer is that `.at` and `.iat` are very specialized, single-purpose, and faster. By contrast, `.loc` and `.iloc` have a lot of computational "overhead" due to the fact that they have to interpret the input syntax and respond accordingly.

Observe this comparison using the `timeit` module
* https://docs.python.org/3/library/timeit.html

In [73]:
%timeit nutrition.loc['Nuts, pecans', 'calories']

The slowest run took 12.40 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 8.15 µs per loop


In [74]:
%timeit nutrition.at['Nuts, pecans', 'calories']

The slowest run took 17.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.52 µs per loop


Obviously these differences for a small dataset are miniscule and insignificant - `.at` is only about 4 microseconds faster than `.loc`. But these differences can add up when analyzing very large datasets!

## BONUS - The `get_loc()` Method

When working with position-based extraction approaches, you may find yourself counting columns and rows to find the exact integer location. Similar for slicing. This is especially the case when we combine labels with integer positions to find what we want.

For example, say we're interested in how much Vitamin K the third food in the dataset has. We know the column is `vitamin_k` and the index position is `3`. However, all of the methods we have learned so far will take one or the other, not both!

We can either use a label-based finder first, then use positional finders, or vice versa.


### Approach #1: Getting integer label from integer position
Start by isolating the index, and obtaining the label that corresponds to the third index position (the third row)

In [75]:
index_label = nutrition.index[2]

Next, assign the column label. In this case we want `vitamin_k`

In [76]:
column_label = 'vitamin_k'

Finally, use `.loc` with both of your labels to get the datapoint that you want.

In [77]:
nutrition.loc[index_label, column_label]

'3.5 mcg'

Alternatively, since we're using a single-value axis, we can also use the `.at` attribute. It is computational more efficient

In [78]:
nutrition.at[index_label, column_label]

'3.5 mcg'

### Approach #2: Determine the integer location from the label
Start by isolating the columns and obtaining the integer position corresponding with the column you want, in this case `vitamin_k`. 

We can do this using the `get_loc()` method, which returns the integer location, slice, or boolean mask of the requested label.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.get_loc.html

In [79]:
column_loc = nutrition.columns.get_loc('vitamin_k')
column_loc

26

Next, assign the index location. This is equal to `2` since we want the third entry in the DataFrame

In [80]:
index_loc = 2

Finally, use either `iloc` or `iat` to get the value!

In [81]:
print(nutrition.iloc[index_loc, column_loc])
print(nutrition.iat[index_loc, column_loc])

3.5 mcg
3.5 mcg


## More Cleanup: Going Numeric

In its current structure, our DataFrame is not conducive to numerical analysis. Why? Because although the values in these columns may initially look like numbers, most if not all all of these columns have values that are *strings*.

In [82]:
nutrition.head(5)

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


If you tried to add values together, you'll get a concatenated string, just as you would with string concatenation in Python.

In [83]:
nutrition.total_fat.sum()

'0.1g72g0.2g2.4g2g0.3g0.7g23g24g18g0g0.4g0.1g7.2g15g4.5g16g0.4g0.9g9.2g0.5g1.7g5.9g33g15g0g0g3g0.4g68g1.5g0.2g11g0.8g16g0.2g0.5g50g0.3g0.6g3g16g3.1g11g2.4g0g0.3g0.5g99g0.1g5.3g5.3g0.4g5.3g0.2g0.3g6.4g22g1g14g10g17g14g4.3g22g36g0.7g100g27g2.8g5.8g14g0.7g1.5g11g34g0.2g0.9g4.6g0g0.4g19g0.2g10g28g6.7g6.7g0.3g0.3g44g37g14g50g0.9g0.7g0.1g0.7g8g10g19g9.2g0g0.2g8.6g1.3g14g0.4g0.2g8.2g5.2g3.3g2.1g0.1g80g0.5g1.4g0.4g3.3g0.2g0.7g0.1g9.4g0.3g0.6g29g0.2g1.4g0.2g0.4g1.2g1.8g0g0.9g0.2g1.4g0.8g1.5g9.8g0g13g16g19g7.4g0.2g5.2g29g0.3g9.9g22g14g15g4.1g0.5g3.5g15g20g32g1g81g0.2g1.6g0g17g22g7.1g7.4g0.2g8.7g1.4g34g0.3g6.3g30g8.1g0.2g0.1g3.7g0.6g0.3g3.9g0.3g1.2g29g14g26g1.1g2g13g0g9g22g3.7g100g0.1g2.1g2.1g0.3g0.2g6.8g8.1g8.3g2.4g0.2g23g6.7g1g0.3g0.2g0.1g12g17g0.1g100g3.4g0.2g6.7g1.5g22g0g1g25g34g3.6g100g7.3g9g1.6g0.4g0.3g18g0.4g9.5g1.4g11g3.1g1.5g0g2.4g3.6g0g15g1.2g6.6g0.7g4.2g15g0.1g0.5g0.2g0.1g3.5g3g0.1g0.4g0.1g0.3g100g31g2.8g9.7g2.3g11g1.7g0.1g1.7g0.5g4.7g0.5g0.2g8.6g18g25g0.5g0.5g26g7.7g0.1g0.1g100g1.1g16

In order to do any math on this, we need to convert the data into *numeric*. Using the `.info()` method, we can see that most of the datatypes are `object` instead of `int` or `float`
*  https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

In [84]:
nutrition.info(verbose = False)

<class 'pandas.core.frame.DataFrame'>
Index: 8789 entries, Cornstarch to Beef, raw, all grades, trimmed to 0" fat, separable lean only, boneless, eye of round steak, round
Columns: 75 entries, serving_size to water
dtypes: int64(2), object(73)
memory usage: 5.4+ MB


## The `astype()` Method

This is a very useful built-in method called `astype()` that allows us to cast a given DataFrame or Series from one datatype to another.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

Let's first explore this method in isolation. Start by creating a new small DataFrame with three columns. Notice that *age* and *weight* are numeric (int and float, respectively) while *height* is a string (or more accurately, an object).

In [85]:
df = pd.DataFrame({
    'age': [12, 13, 14, 16],
    'weight': [41.1, 34.5, 83.2, 90.1],
    'height': ['1.72', '1.74', '1.91', '1.54']
})
df

Unnamed: 0,age,weight,height
0,12,41.1,1.72
1,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


Confirm the datatypes using the `info()` method

In [86]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   age     4 non-null      int64  
 1   weight  4 non-null      float64
 2   height  4 non-null      object 
dtypes: float64(1), int64(1), object(1)
memory usage: 224.0+ bytes


We can now use the `astype()` method to convert all columns to one common datatype. In this case we'll use **float**

In [87]:
df = df.astype(float)
df

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
1,13.0,34.5,1.74
2,14.0,83.2,1.91
3,16.0,90.1,1.54


To check whether it worked, use the `info()` method again.

In [88]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   age     4 non-null      float64
 1   weight  4 non-null      float64
 2   height  4 non-null      float64
dtypes: float64(3)
memory usage: 224.0 bytes


The `astype()` method is also quite flexible. We can cast datatypes based on specific columns by passing in a dictionary whose keys are column names and values are datatypes.
* This works with both Python types and Numpy dtypes

In [89]:
df = df.astype({'age' : int})

In [90]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   age     4 non-null      int64  
 1   weight  4 non-null      float64
 2   height  4 non-null      float64
dtypes: float64(2), int64(1)
memory usage: 224.0 bytes


Now let's try to solve our problem with the `nutrition` DataFrame. Start by bringing up the first few rows and all columns, and attempt to case them to **floats**

In [91]:
# nutrition.iloc[0:4, :].astype(float)

Notice how this does NOT work. The reason is that the values are strings that contain things other than numbers, namely letters. Pandas does not know how to convert "100 g" to "100". 

For that, we will need to use a different method and regular expressions in order to remove the letters from the values.

## DataFrame `replace()` + A Glimpse at Regex

We can't use `astype()` as a one-stop solution to convert our values the way we want to. Instead, we'll need to separate the numerical values from the letters they are attached to, and then apply `astype()` to the numerical portion.

For this, we can use the `replace()` method. This intuitive method replaces one thing with another.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html

Let's start with some basic practice with `replace()`. Let's obtain the first six rows of our **nutrition** DataFrame and the first column.

In [92]:
dfm = nutrition.iloc[:6,0:1]
dfm

Unnamed: 0_level_0,serving_size
name,Unnamed: 1_level_1
Cornstarch,100 g
"Nuts, pecans",100 g
"Eggplant, raw",100 g
"Teff, uncooked",100 g
"Sherbet, orange",100 g
"Cauliflower, raw",100 g


The *serving_size* column is of `object` datatype because they are all strings

In [93]:
nutrition.iloc[:6,0:1].info()

<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, Cornstarch to Cauliflower, raw
Data columns (total 1 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   serving_size  6 non-null      object
dtypes: object(1)
memory usage: 96.0+ bytes


`replace()` works by passing an argument that contains what you want to replace to the `to_replace` parameter, and the value you want to replace it with to the `value` parameter.
* Note that you can also pass in `to_replace` and `value` parameters positionally, as they are the first two positional parameters for the method

In [94]:
dfm.replace(to_replace='100 g', value = 100)

Unnamed: 0_level_0,serving_size
name,Unnamed: 1_level_1
Cornstarch,100
"Nuts, pecans",100
"Eggplant, raw",100
"Teff, uncooked",100
"Sherbet, orange",100
"Cauliflower, raw",100


Let's check the datatype of this replaced column. Sure enough, we have converted these values to integers.

In [95]:
dfm.replace(to_replace='100 g', value = 100).info()

<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, Cornstarch to Cauliflower, raw
Data columns (total 1 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   serving_size  6 non-null      int64
dtypes: int64(1)
memory usage: 96.0+ bytes


But there's an issue: using `replace()` in this method is very targeted. That is, it depends on EXACT STRINGS. More useful would be to replace the *g* with nothing, leaving just the numerical portion. Let's try it:

In [96]:
dfm.replace('g', '')

Unnamed: 0_level_0,serving_size
name,Unnamed: 1_level_1
Cornstarch,100 g
"Nuts, pecans",100 g
"Eggplant, raw",100 g
"Teff, uncooked",100 g
"Sherbet, orange",100 g
"Cauliflower, raw",100 g


This clearly did not work, the reason being that the method is looking for a value that exactly matches *g*. This does not exist in our DataFrame, as each instance of *g* is part of a larger string. Thus, the method just returns a copy of the same DataFrame.

Instead, we can use *regular expressions* to target and replace certain patterns of strings, not the exact strings themselves. 

The `replace()` method remains the same. But now we invoke the `regex` parameter and set it to `True`, telling the method that we want the `to_replace` and `value` parameters to be interpreted as regular expressions.

What regex should we use for this? Remember that we want to remove the space after the number as well as the g after the space and replace that selection with nothing. That can be identified using the `\sg` regular expression.

In [97]:
dfm.replace('\sg', '', regex = True)

Unnamed: 0_level_0,serving_size
name,Unnamed: 1_level_1
Cornstarch,100
"Nuts, pecans",100
"Eggplant, raw",100
"Teff, uncooked",100
"Sherbet, orange",100
"Cauliflower, raw",100


Viola, the ` g` is gone! We are now ready to cast this entire simple DataFrame to `int` using `astype()`

In [98]:
dfm.replace('\sg', '', regex = True).astype(int)

Unnamed: 0_level_0,serving_size
name,Unnamed: 1_level_1
Cornstarch,100
"Nuts, pecans",100
"Eggplant, raw",100
"Teff, uncooked",100
"Sherbet, orange",100
"Cauliflower, raw",100


But are we ready to scale this sucker up to the entire **nutrition** DataFrame? Not really, because we have many patterns that need to be replaced. They include:
* g
* g (with no space)
* mg
* mcg

We would need to identify and replace ALL of these patterns! `replace()` can be used for this, but there are better ways that we will see later on. Also, do we REALLY want to get rid of the units entirely? What if we forget what the units were for each column? We probably want to keep the units somewhere, and one good place to sock them is into the column names!

## Part 1 of Replacing the Values - Collecting the Units

In this effort, we will extract the units from each column. Yes, it will feel tedious and challenging, but that's a good thing.

So how do we isolate the units? Let's start by removing all the numerical stuff. If we remove the numbers, only the units will remain.

One way to do this is the use the `replace()` method with a regex pattern that identifies the numeric portion of each string and removes it.

Let's grab a sample of our DataFrame to see what we're dealing with.

In [99]:
nutrition.sample(20, axis = 1).head()

Unnamed: 0_level_0,carotene_beta,serine,thiamin,methionine,vitamin_a,caffeine,carotene_alpha,threonine,glutamic_acid,cholesterol,phosphorous,sugars,irom,vitamin_k,zink,sucrose,riboflavin,tyrosine,magnesium,phenylalanine
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Cornstarch,0.00 mcg,0.012 g,0.000 mg,0.006 g,0.00 IU,0.00 mg,0.00 mcg,0.009 g,0.053 g,0,13.00 mg,0.00 g,0.47 mg,0.0 mcg,0.06 mg,0,0.000 mg,0.010 g,3.00 mg,0.013 g
"Nuts, pecans",29.00 mcg,0.474 g,0.660 mg,0.183 g,56.00 IU,0.00 mg,0.00 mcg,0.306 g,1.829 g,0,277.00 mg,3.97 g,2.53 mg,3.5 mcg,4.53 mg,3.90 g,0.130 mg,0.215 g,121.00 mg,0.426 g
"Eggplant, raw",14.00 mcg,0.042 g,0.039 mg,0.011 g,23.00 IU,0.00 mg,0.00 mcg,0.037 g,0.186 g,0,24.00 mg,3.53 g,0.23 mg,3.5 mcg,0.16 mg,0.26 g,0.037 mg,0.027 g,14.00 mg,0.043 g
"Teff, uncooked",5.00 mcg,0.622 g,0.390 mg,0.428 g,9.00 IU,0,0.00 mcg,0.510 g,3.349 g,0,429.00 mg,1.84 g,7.63 mg,1.9 mcg,3.63 mg,0.62 g,0.270 mg,0.458 g,184.00 mg,0.698 g
"Sherbet, orange",1.00 mcg,0,0.027 mg,0,46.00 IU,0.00 mg,0.00 mcg,0,0,1mg,40.00 mg,24.32 g,0.14 mg,0.0 mcg,0.48 mg,0,0.097 mg,0,8.00 mg,0


We've got a hodgepodge of different values. Some have no letters. Some have *g*. Some have *mg*. Others have international units (*IU*). Some have spaces between the number and the unit. Others do not. 

Which regular expression will select ALL potential numeric characters? We can do this using **character sets** in regex, which allow you to specify a group of characters that you want to match. We can do this with the following regex:
`\s*[^a-z+A-Z+]`. This essentially says "find any string that starts with any number of letters A through Z (lowercase or capitalized) which may or may not be preceded by a whitespace. Then, replace anything that is NOT part of this selection with a blank". Put another way, "keep all letters, remove all numbers and replace with blank".

In [100]:
nutrition.replace('\s*[^a-z+A-Z+]', '', regex = True).head(5)

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,g,381,g,,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,0,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g
"Nuts, pecans",g,691,g,g,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,0,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,mg,g,g,mg,mg,g
"Eggplant, raw",g,25,g,,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,0,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,g,,,g,g,g,g,g,mg,g,g,mg,mg,g
"Teff, uncooked",g,367,g,g,,mg,mg,,,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,0,,mg,,,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,g,,,g
"Sherbet, orange",g,144,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,0,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,,,,,,,,,,,,,,,,,,,,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g


Note that the **calories** column is numeric and was not affected by the regex search. We can deal with it by casting the entire DataFrame as strings before we do the replacement, such that all columns will be impacted by the regex replacement. Remember, we are not doing anything with these numbers for the moment. We just want the units.

In [101]:
units = nutrition.astype(str).replace('\s*[^a-z+A-Z+]', '', regex = True)
units

Unnamed: 0_level_0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,g,,g,,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g
"Nuts, pecans",g,,g,g,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,mg,g,g,mg,mg,g
"Eggplant, raw",g,,g,,,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,g,,,g,g,g,g,g,mg,g,g,mg,mg,g
"Teff, uncooked",g,,g,g,,mg,mg,,,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,,mg,,,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,g,,,g
"Sherbet, orange",g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,,,,,,,,,,,,,,,,,,,,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Beef, raw, all grades, trimmed to 0"" fat, separable lean and fat, boneless, top round roast, round",g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g
"Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand",g,,g,g,mg,mg,,mcg,mcg,mg,mg,mg,mg,IU,mcg,,,,,,mcg,mg,mg,,mg,mg,,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,,g,g,g,g,mg,,g,,,g
"Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand",g,,g,g,mg,mg,,mcg,mcg,mg,mg,mg,mg,IU,mcg,,,,,,mcg,mg,mg,,mg,mg,,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,,g,g,g,g,mg,,g,,,g
"Beef, raw, all grades, trimmed to 0"" fat, separable lean only, boneless, eye of round roast, round",g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g


We've now removed all numerical values. But notice that we still have some gaps.
* Sometimes units are included, sometimes they're not
* There are some instances of `NaN`
* There are some columns that are totally blank

How do we deal with this? We can't just pick a random row to represent our units, as it may have a gap in one or more of the columns. 

One way do deal with this is to find the *mode unit* from each column, that is, the unit that appears most often in each column. There is a good chance that the mode unit represents the unit that belongs in that column. For example, let's check out the **saturated_fat** column.

In [102]:
units.saturated_fat.value_counts()

g      7199
nan    1590
Name: saturated_fat, dtype: int64

We see here that 1590 foods (rows) have `NaN` (probably representing foods with no saturated fat, while the remaining 7199 rows have `g` as unit. Thus, we conclude that the most common unit is `g`. 

How do we collect the mode unit for every column? Simple! Apply the `mode()` method to the DataFrame. This will give you a DataFrame reporting the mode unit for each column in the original DataFrame. Amazing!
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mode.html

In [103]:
units.mode()

Unnamed: 0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g


Now let's reassign the `units` dataframe to point to the *unit mode* DataFrame. We've got our units and now can do some real damage with them.

We'll continue this exercise in the next section.

In [104]:
units = units.mode()
units

Unnamed: 0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g


## The `rename()` Method

`rename()` is one of the most useful methods in Pandas. It allows us to alter the axes labels of either our index or our columns. This will be critical for adding the units to the columns in the `nutrition` DataFrame
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

Let's start with the simple `df` DataFrame from earlier.

In [105]:
df

Unnamed: 0,age,weight,height
0,12,41.1,1.72
1,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


If we want to rename our indices, we can pass in to the `index` paramater a dictionary with the current index labels as the keys and the new labels as the values. 
* Keep in mind that like many other DataFrame methods, by default `rename()` creates a copy of the original DataFrame. To perform the renaming in place, use the familiar `inplace` parameter.

In [106]:
df.rename(index={0: 'Pikachu', 1: 'Andy'})

Unnamed: 0,age,weight,height
Pikachu,12,41.1,1.72
Andy,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


How about renaming the column labels? Easy, just pass the dictionary into the `columns` parameter!

In [107]:
df.rename(columns = {'weight':'Weight (kg)'})

Unnamed: 0,age,Weight (kg),height
0,12,41.1,1.72
1,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


Can you perform renaming on both axes at the same time? You bet!

In [108]:
df.rename(index={0: 'Pikachu', 1: 'Andy'}, columns = {'weight':'Weight (kg)'})

Unnamed: 0,age,Weight (kg),height
Pikachu,12,41.1,1.72
Andy,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


### An alternative approach to renaming - the `mapper` parameter

Using the `mapper` parameter within rename is an alternative to using the `index` and `columns` parameters. To do this:
1. Pass a dictionary into `mapper` that describes the labels you want to replace, and what you want to replace them with (similar to `index` and `columns`)
2. Specify the axis that the mapper should apply to by using the `axis` paramter. `0` for "index" or `1` for "columns.

* Note that `mapper` cannot be used to rename both the rows and columns at the same time. In those cases we would be forced to use `columns` and `index`.

In [109]:
df.rename(mapper={'height':'Height (m)'}, axis=1)

Unnamed: 0,age,weight,Height (m)
0,12,41.1,1.72
1,13,34.5,1.74
2,14,83.2,1.91
3,16,90.1,1.54


## DataFrame `dropna()`
We previously used the Series `dropna()` method to drop all `NaN` values for a Series. DataFrames also have a `dropna()` method that works the exact same way, but it comes packed with a few additional features. With DataFrames, we can specify the axis that we want to drop from.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
 * This is yet another method that can utilize the `inplace` parameter.

Let's continue working with our small `df` DataFrame and insert an `NaN` value.

In [110]:
df.loc[2, 'weight'] = np.nan
df

Unnamed: 0,age,weight,height
0,12,41.1,1.72
1,13,34.5,1.74
2,14,,1.91
3,16,90.1,1.54


Let's also change an entire row to contain `NaN` values.

In [111]:
df.loc[1, :] = np.nan
df

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
1,,,
2,14.0,,1.91
3,16.0,90.1,1.54


We now have a DataFrame with at least one `NaN` in each column and at least one `NaN` in two of its four rows. Let's see how `dropna()` works here.


In [112]:
df.dropna()

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
3,16.0,90.1,1.54


As you can see, by default, dropna() will exclude any *row* that contains at least one `NaN` value. Additionally, the `how` parameter defaults to "any", meaning that if ANY row contains a `NaN` value, that row will be dropped. But we can customize this.
* We can select the axis on which the method operates (0/rows or 1/columns) with the `axis` parameter
* We can set the `how` parameter to "all", meaning the row/column will only get dropped if ALL columns in that row or all rows in that column contain `NaN`

In [113]:
df.dropna(how = 'any', axis = 0)

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
3,16.0,90.1,1.54


If we set the axis to 0 (rows) and `how = 'all'`, only the row with all `NaN` values will be dropped.

In [114]:
df.dropna(how = 'all', axis = 0)

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
2,14.0,,1.91
3,16.0,90.1,1.54


Another parameter we can set is `thresh`. This lets us specify the minimum number of non-NA values that a record must contain in order to be kept in the DataFrame.

In [115]:
df.dropna(thresh=3, axis = 0)

Unnamed: 0,age,weight,height
0,12.0,41.1,1.72
3,16.0,90.1,1.54


Let's play with the axis paremeter. What if we set `axis=1` for our small `df` DataFrame

The following code returns an empty DataFrame! Why? Because each column had at least one `NaN` value.

In [116]:
df.dropna(axis = 1)

0
1
2
3


Let's play with the `thresh` parameter for the column axis. In this code, all columns that do not have at least 3 non-NA values will be dropped.

In [117]:
df.dropna(axis = 1, thresh = 3, inplace = True)
df

Unnamed: 0,age,height
0,12.0,1.72
1,,
2,14.0,1.91
3,16.0,1.54


## BONUS Lecture - `dropna()` with Subset

The `subset` parameter of `dropna()` allows you to specify labels along the non-dropping axis (the orthogonal axis) to consider.

Let's start by adding another column, **gender**, to our small DataFrame

In [118]:
df['gender'] = ['M', 'F', np.nan, 'F']
df

Unnamed: 0,age,height,gender
0,12.0,1.72,M
1,,,F
2,14.0,1.91,
3,16.0,1.54,F


If we run `dropna()`, we will see that rows 1 and 2 will be dropped since both of them contain at least one `NaN`

In [119]:
df.dropna()

Unnamed: 0,age,height,gender
0,12.0,1.72,M
3,16.0,1.54,F


Using `subset`, we can restrict `dropna()` to only the rows and columns that interest us. In this example, we are telling the method to drop any row that has a `NaN` in the **gender** column. The method will not consider any other columns when performing this.

Thus, only the one row that has `NaN` in the **gender** column (row 2) will be dropped.

In [120]:
df.dropna(axis = 0, how = 'any', subset = ['gender'])

Unnamed: 0,age,height,gender
0,12.0,1.72,M
1,,,F
3,16.0,1.54,F


Trying again with **age** column

In [121]:
df.dropna(axis = 0, how = 'any', subset = ['age'])

Unnamed: 0,age,height,gender
0,12.0,1.72,M
2,14.0,1.91,
3,16.0,1.54,F


This also works with the column axis. Here, we specify a list of index labels that we want to scrutinize for the drops. Let's try it.



In [122]:
df

Unnamed: 0,age,height,gender
0,12.0,1.72,M
1,,,F
2,14.0,1.91,
3,16.0,1.54,F


Below, we are saying to drop any column that has a `NaN` value in either rows with index label 1 or 2. In this case, all columns have an `NaN` value in at least one of those rows, and so ALL columns will be dropped. 

In [123]:
df.dropna(axis = 1, how = 'any', subset = [1, 2])

0
1
2
3


In this example, we are saying to drop any column that has a `NaN` value in either rows with index label 0 or 2. Here, the **gender** column will be dropped because it has an `NaN` value in row 2. Row 0 has no `NaN` values, so it will not trigger any column drops.

In [124]:
df.dropna(axis = 1, how = 'any', subset = [0, 2])

Unnamed: 0,age,height
0,12.0,1.72
1,,
2,14.0,1.91
3,16.0,1.54


## Part 2 of Replacing the Values: Merging Units with Column Names

Remember that our goal is to transform all of our data into numeric while also preserving the units by moving that information to the column names.

Recall our `units` DataFrame

In [125]:
units

Unnamed: 0,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,g,,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,,g,g,g,g,g,g,g,g,g,g,g,g,g,g,,,,,,,g,g,g,g,mg,g,g,mg,mg,g


We will now apply the `rename()` method. But first, we need to figure out how to iterate over the DataFrame so that we can do something with the units.

Using a `for loop`, we can grab all of the column labels from the `units` DataFrame.

In [126]:
for k in units:
  print(k)

serving_size
calories
total_fat
saturated_fat
cholesterol
sodium
choline
folate
folic_acid
niacin
pantothenic_acid
riboflavin
thiamin
vitamin_a
vitamin_a_rae
carotene_alpha
carotene_beta
cryptoxanthin_beta
lutein_zeaxanthin
lucopene
vitamin_b12
vitamin_b6
vitamin_c
vitamin_d
vitamin_e
tocopherol_alpha
vitamin_k
calcium
copper
irom
magnesium
manganese
phosphorous
potassium
selenium
zink
protein
alanine
arginine
aspartic_acid
cystine
glutamic_acid
glycine
histidine
hydroxyproline
isoleucine
leucine
lysine
methionine
phenylalanine
proline
serine
threonine
tryptophan
tyrosine
valine
carbohydrate
fiber
sugars
fructose
galactose
glucose
lactose
maltose
sucrose
fat
saturated_fatty_acids
monounsaturated_fatty_acids
polyunsaturated_fatty_acids
fatty_acids_total_trans
alcohol
ash
caffeine
theobromine
water


Furthermore, we can use square bracketing to get a Series for each column. We are selecting columns from the DataFrame, returning a Series.

In [127]:
for k in units:
  print(k, units[k])

serving_size 0    g
Name: serving_size, dtype: object
calories 0    
Name: calories, dtype: object
total_fat 0    g
Name: total_fat, dtype: object
saturated_fat 0    g
Name: saturated_fat, dtype: object
cholesterol 0    mg
Name: cholesterol, dtype: object
sodium 0    mg
Name: sodium, dtype: object
choline 0    mg
Name: choline, dtype: object
folate 0    mcg
Name: folate, dtype: object
folic_acid 0    mcg
Name: folic_acid, dtype: object
niacin 0    mg
Name: niacin, dtype: object
pantothenic_acid 0    mg
Name: pantothenic_acid, dtype: object
riboflavin 0    mg
Name: riboflavin, dtype: object
thiamin 0    mg
Name: thiamin, dtype: object
vitamin_a 0    IU
Name: vitamin_a, dtype: object
vitamin_a_rae 0    mcg
Name: vitamin_a_rae, dtype: object
carotene_alpha 0    mcg
Name: carotene_alpha, dtype: object
carotene_beta 0    mcg
Name: carotene_beta, dtype: object
cryptoxanthin_beta 0    mcg
Name: cryptoxanthin_beta, dtype: object
lutein_zeaxanthin 0    mcg
Name: lutein_zeaxanthin, dtype: object

Instead of the entire Series object, we just want the unit contained within the series. Where is that unit located? Let's look at just one of these Series to find out.

In [128]:
for item in units['total_fat']:
  print(item)

g


As we should have known, there is only one item in each column, indexed at the 0 position. Makes sense, since there was only one row in the original DataFrame! Let's grab the unit from each column.

In [129]:
for k in units:
  print(k, units[k].at[0])

serving_size g
calories 
total_fat g
saturated_fat g
cholesterol mg
sodium mg
choline mg
folate mcg
folic_acid mcg
niacin mg
pantothenic_acid mg
riboflavin mg
thiamin mg
vitamin_a IU
vitamin_a_rae mcg
carotene_alpha mcg
carotene_beta mcg
cryptoxanthin_beta mcg
lutein_zeaxanthin mcg
lucopene 
vitamin_b12 mcg
vitamin_b6 mg
vitamin_c mg
vitamin_d IU
vitamin_e mg
tocopherol_alpha mg
vitamin_k mcg
calcium mg
copper mg
irom mg
magnesium mg
manganese mg
phosphorous mg
potassium mg
selenium mcg
zink mg
protein g
alanine g
arginine g
aspartic_acid g
cystine g
glutamic_acid g
glycine g
histidine g
hydroxyproline 
isoleucine g
leucine g
lysine g
methionine g
phenylalanine g
proline g
serine g
threonine g
tryptophan g
tyrosine g
valine g
carbohydrate g
fiber g
sugars g
fructose 
galactose 
glucose 
lactose 
maltose 
sucrose 
fat g
saturated_fatty_acids g
monounsaturated_fatty_acids g
polyunsaturated_fatty_acids g
fatty_acids_total_trans mg
alcohol g
ash g
caffeine mg
theobromine mg
water g


Great! Let's now remove the column labels to see the units in isolation. When doing this, we see lots of gaps - some characteristics do not have any units.

In [130]:
for k in units:
  print(units[k].at[0])

g

g
g
mg
mg
mg
mcg
mcg
mg
mg
mg
mg
IU
mcg
mcg
mcg
mcg
mcg

mcg
mg
mg
IU
mg
mg
mcg
mg
mg
mg
mg
mg
mg
mg
mcg
mg
g
g
g
g
g
g
g
g

g
g
g
g
g
g
g
g
g
g
g
g
g
g






g
g
g
g
mg
g
g
mg
mg
g


Column labels describing food characteristics that have no units do not need to be modified. That is, we do not need to edit these labels to include units, as they do not have units. We can remove these labels from the `units` DataFrame.

To remove irrelevant columns, we can use the `replace()` method to replace the blanks with `NaN`. Then we will chain on the `dropna()` method to remove columns with `NaN`.

In [131]:
units = units.replace('', np.nan).dropna(axis=1)
units

Unnamed: 0,serving_size,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,g,g,g,mg,mg,mg,mcg,mcg,mg,mg,mg,mg,IU,mcg,mcg,mcg,mcg,mcg,mcg,mg,mg,IU,mg,mg,mcg,mg,mg,mg,mg,mg,mg,mg,mcg,mg,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,g,mg,g,g,mg,mg,g


In [132]:
for k in units:
  print(units[k].at[0])

g
g
g
mg
mg
mg
mcg
mcg
mg
mg
mg
mg
IU
mcg
mcg
mcg
mcg
mcg
mcg
mg
mg
IU
mg
mg
mcg
mg
mg
mg
mg
mg
mg
mg
mcg
mg
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
mg
g
g
mg
mg
g


Now that we have collected all of the units in a form that we can easily iterate over, the next step will be to create a mapper consisting of dictionary key:value pairs with old labels and the new. This mapper will then be used on the original DataFrame in order to replace the original column labels with the "unified" column labels.

Let's build our dictionary. Below is a conceptual example of the dictionary we are trying to build.

In [133]:
{
    'serving_size':'serving_size_g',
    'total_fat': 'total_fat_g'
}

{'serving_size': 'serving_size_g', 'total_fat': 'total_fat_g'}

Let's use **dictionary comprehension** to build this dictionary easily and quickly. Here, for each column in `units`, capture the column label `k` as well as the unit for that column

In [134]:
mapper = {k:units[k].at[0] for k in units}
mapper

{'alanine': 'g',
 'alcohol': 'g',
 'arginine': 'g',
 'ash': 'g',
 'aspartic_acid': 'g',
 'caffeine': 'mg',
 'calcium': 'mg',
 'carbohydrate': 'g',
 'carotene_alpha': 'mcg',
 'carotene_beta': 'mcg',
 'cholesterol': 'mg',
 'choline': 'mg',
 'copper': 'mg',
 'cryptoxanthin_beta': 'mcg',
 'cystine': 'g',
 'fat': 'g',
 'fatty_acids_total_trans': 'mg',
 'fiber': 'g',
 'folate': 'mcg',
 'folic_acid': 'mcg',
 'glutamic_acid': 'g',
 'glycine': 'g',
 'histidine': 'g',
 'irom': 'mg',
 'isoleucine': 'g',
 'leucine': 'g',
 'lutein_zeaxanthin': 'mcg',
 'lysine': 'g',
 'magnesium': 'mg',
 'manganese': 'mg',
 'methionine': 'g',
 'monounsaturated_fatty_acids': 'g',
 'niacin': 'mg',
 'pantothenic_acid': 'mg',
 'phenylalanine': 'g',
 'phosphorous': 'mg',
 'polyunsaturated_fatty_acids': 'g',
 'potassium': 'mg',
 'proline': 'g',
 'protein': 'g',
 'riboflavin': 'mg',
 'saturated_fat': 'g',
 'saturated_fatty_acids': 'g',
 'selenium': 'mcg',
 'serine': 'g',
 'serving_size': 'g',
 'sodium': 'mg',
 'sugars': 'g

This is getting close. But we don't just want the units in isolation because we wouldn't have any idea of what the nutrition characteristic is. Instead, we want to *append* the units to the existing column names.

In [135]:
mapper = {k: k + "_" + units[k].at[0] for k in units}
mapper

{'alanine': 'alanine_g',
 'alcohol': 'alcohol_g',
 'arginine': 'arginine_g',
 'ash': 'ash_g',
 'aspartic_acid': 'aspartic_acid_g',
 'caffeine': 'caffeine_mg',
 'calcium': 'calcium_mg',
 'carbohydrate': 'carbohydrate_g',
 'carotene_alpha': 'carotene_alpha_mcg',
 'carotene_beta': 'carotene_beta_mcg',
 'cholesterol': 'cholesterol_mg',
 'choline': 'choline_mg',
 'copper': 'copper_mg',
 'cryptoxanthin_beta': 'cryptoxanthin_beta_mcg',
 'cystine': 'cystine_g',
 'fat': 'fat_g',
 'fatty_acids_total_trans': 'fatty_acids_total_trans_mg',
 'fiber': 'fiber_g',
 'folate': 'folate_mcg',
 'folic_acid': 'folic_acid_mcg',
 'glutamic_acid': 'glutamic_acid_g',
 'glycine': 'glycine_g',
 'histidine': 'histidine_g',
 'irom': 'irom_mg',
 'isoleucine': 'isoleucine_g',
 'leucine': 'leucine_g',
 'lutein_zeaxanthin': 'lutein_zeaxanthin_mcg',
 'lysine': 'lysine_g',
 'magnesium': 'magnesium_mg',
 'manganese': 'manganese_mg',
 'methionine': 'methionine_g',
 'monounsaturated_fatty_acids': 'monounsaturated_fatty_acids

Now for the grand almost-finale: let's use the `rename()` method to rename the columns of our `nutrition` DataFrame using our mapper dictionary!

In [136]:
nutrition.rename(columns=mapper, inplace = True)
nutrition

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Beef, raw, all grades, trimmed to 0"" fat, separable lean and fat, boneless, top round roast, round",100 g,125,3.5g,1.4g,62mg,54.00 mg,64.5 mg,4.00 mcg,0.00 mcg,6.422 mg,0.356 mg,0.234 mg,0.063 mg,11.00 IU,3.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.631 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.33 mg,12.00 mg,0.004 mg,219.00 mg,311.00 mg,22.1 mcg,3.67 mg,23.45 g,1.454 g,1.597 g,2.285 g,0.239 g,3.834 g,1.154 g,0.879 g,0.160 g,1.092 g,2.021 g,2.246 g,0.635 g,0.941 g,1.052 g,0.966 g,1.105 g,0.262 g,0.874 g,1.172 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.50 g,1.353 g,1.554 g,0.244 g,62.00 mg,0.0 g,1.11 g,0.00 mg,0.00 mg,72.51 g
"Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand",100 g,206,8.9g,3.9g,109mg,50.00 mg,0,0.00 mcg,0.00 mcg,7.680 mg,0.580 mg,0.500 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.95 mcg,0.140 mg,0.0 mg,0,0.19 mg,0.19 mg,0,13.00 mg,0.114 mg,2.35 mg,22.00 mg,0.029 mg,246.00 mg,188.00 mg,2.0 mcg,4.30 mg,29.59 g,1.780 g,1.758 g,2.605 g,0.353 g,4.294 g,1.445 g,0.937 g,0,1.428 g,2.302 g,2.613 g,0.759 g,1.205 g,1.241 g,1.100 g,1.267 g,0.346 g,0.995 g,1.597 g,0.00 g,0.0 g,0,0,0,0,0,0,0,8.86 g,3.860 g,3.480 g,0.520 g,109.00 mg,0,1.60 g,0,0,59.95 g
"Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand",100 g,277,23g,12g,78mg,39.00 mg,0,1.00 mcg,0.00 mcg,6.550 mg,0.520 mg,0.320 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.42 mcg,0.110 mg,0.0 mg,0,0.21 mg,0.21 mg,0,13.00 mg,0.083 mg,1.49 mg,15.00 mg,0.018 mg,168.00 mg,136.00 mg,1.3 mcg,2.39 mg,16.74 g,1.007 g,0.994 g,1.473 g,0.200 g,2.429 g,0.818 g,0.530 g,0,0.808 g,1.302 g,1.478 g,0.430 g,0.681 g,0.702 g,0.622 g,0.716 g,0.196 g,0.563 g,0.903 g,0.00 g,0.0 g,0,0,0,0,0,0,0,22.74 g,11.570 g,8.720 g,0.980 g,78.00 mg,0,0.92 g,0,0,59.80 g
"Beef, raw, all grades, trimmed to 0"" fat, separable lean only, boneless, eye of round roast, round",100 g,121,3g,1.1g,60mg,53.00 mg,64.2 mg,4.00 mcg,0.00 mcg,6.720 mg,0.355 mg,0.184 mg,0.063 mg,4.00 IU,1.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.84 mcg,0.644 mg,0.0 mg,1.00 IU,0.24 mg,0.24 mg,1.5 mcg,13.00 mg,0.042 mg,1.45 mg,12.00 mg,0.001 mg,222.00 mg,319.00 mg,22.6 mcg,3.42 mg,23.37 g,1.525 g,1.714 g,2.468 g,0.256 g,4.167 g,1.101 g,0.948 g,0.118 g,1.192 g,2.198 g,2.457 g,0.679 g,1.018 g,1.073 g,1.037 g,1.201 g,0.287 g,0.954 g,1.259 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.04 g,1.086 g,1.266 g,0.233 g,60.00 mg,0.0 g,1.10 g,0.00 mg,0.00 mg,73.43 g


Look at that, so beautiful and elegant. The best part is that only the columns that have units are are impacted by this rename. Any columns that have no units are left alone, and that's exactly what we want.

## Part 3 of Replacing the Values: Removing Units from Values

In this last part, we will remove all of the units from the original DataFrame. Nothing new, just an exercise of the methods we've already used.

Our approach of choice will be using the `replace()` method using *regular expressions*.

Multiple regex patters are possible, but to identify any letter values we can use `[a-z+A-Z+]`. Then us the `replace()` method to state that any pattern matching the regex expression by replaced with a blank, leaving only numbers behind.

In [137]:
nutrition.replace('[a-z+A-Z+]', '', regex = True, inplace=True)

In [138]:
nutrition.head()

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100,381,0.1,,0,9.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.05,0.47,3.0,0.053,13.0,3.0,2.8,0.06,0.26,0.019,0.012,0.02,0.006,0.053,0.009,0.008,0,0.01,0.036,0.006,0.006,0.013,0.024,0.012,0.009,0.001,0.01,0.014,91.27,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.009,0.016,0.025,0.0,0.0,0.09,0.0,0.0,8.32
"Nuts, pecans",100,691,72.0,6.2,0,0.0,40.5,22.0,0.0,1.167,0.863,0.13,0.66,56.0,3.0,0.0,29.0,9.0,17.0,0,0.0,0.21,1.1,0.0,1.4,1.4,3.5,70.0,1.2,2.53,121.0,4.5,277.0,410.0,3.8,4.53,9.17,0.397,1.177,0.929,0.152,1.829,0.453,0.262,0,0.336,0.598,0.287,0.183,0.426,0.363,0.474,0.306,0.093,0.215,0.411,13.86,9.6,3.97,0.04,0.0,0.04,0.0,0.0,3.9,71.97,6.18,40.801,21.614,0.0,0.0,1.49,0.0,0.0,3.52
"Eggplant, raw",100,25,0.2,,0,2.0,6.9,22.0,0.0,0.649,0.281,0.037,0.039,23.0,1.0,0.0,14.0,0.0,36.0,0,0.0,0.084,2.2,0.0,0.3,0.3,3.5,9.0,0.081,0.23,14.0,0.232,24.0,229.0,0.3,0.16,0.98,0.051,0.057,0.164,0.006,0.186,0.041,0.023,0,0.045,0.064,0.047,0.011,0.043,0.043,0.042,0.037,0.009,0.027,0.053,5.88,3.0,3.53,1.54,0.0,1.58,0.0,0.0,0.26,0.18,0.034,0.016,0.076,0.0,0.0,0.66,0.0,0.0,92.3
"Teff, uncooked",100,367,2.4,0.4,0,12.0,13.1,0.0,0.0,3.363,0.942,0.27,0.39,9.0,0.0,0.0,5.0,0.0,66.0,0,0.0,0.482,0.0,0.0,0.08,0.08,1.9,180.0,0.81,7.63,184.0,9.24,429.0,427.0,4.4,3.63,13.3,0.747,0.517,0.82,0.236,3.349,0.477,0.301,0,0.501,1.068,0.376,0.428,0.698,0.664,0.622,0.51,0.139,0.458,0.686,73.13,8.0,1.84,0.47,0.0,0.73,0.0,0.01,0.62,2.38,0.449,0.589,1.071,0.0,0.0,2.37,0.0,0.0,8.82
"Sherbet, orange",100,144,2.0,1.2,1,46.0,7.7,4.0,0.0,0.063,0.224,0.097,0.027,46.0,12.0,0.0,1.0,5.0,7.0,0,0.13,0.023,2.3,0.0,0.01,0.01,0.0,54.0,0.028,0.14,8.0,0.011,40.0,96.0,1.5,0.48,1.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.4,1.3,24.32,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.16,0.53,0.08,1.0,0.0,0.4,0.0,0.0,66.1


We're note quite done yet though. If we take a look at the datatypes, we'll see that 73 out of 74 of our columns are non-numeric, but rather are of dtype "object".

In [139]:
nutrition.dtypes.value_counts()

object    73
int64      2
dtype: int64

 To fix this, we will convert all values with numbers to *floats* using the `.astype()` method.

In [140]:
nutrition = nutrition.astype(float)
nutrition.head()

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Cornstarch,100.0,381.0,0.1,,0.0,9.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.05,0.47,3.0,0.053,13.0,3.0,2.8,0.06,0.26,0.019,0.012,0.02,0.006,0.053,0.009,0.008,0.0,0.01,0.036,0.006,0.006,0.013,0.024,0.012,0.009,0.001,0.01,0.014,91.27,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.009,0.016,0.025,0.0,0.0,0.09,0.0,0.0,8.32
"Nuts, pecans",100.0,691.0,72.0,6.2,0.0,0.0,40.5,22.0,0.0,1.167,0.863,0.13,0.66,56.0,3.0,0.0,29.0,9.0,17.0,0.0,0.0,0.21,1.1,0.0,1.4,1.4,3.5,70.0,1.2,2.53,121.0,4.5,277.0,410.0,3.8,4.53,9.17,0.397,1.177,0.929,0.152,1.829,0.453,0.262,0.0,0.336,0.598,0.287,0.183,0.426,0.363,0.474,0.306,0.093,0.215,0.411,13.86,9.6,3.97,0.04,0.0,0.04,0.0,0.0,3.9,71.97,6.18,40.801,21.614,0.0,0.0,1.49,0.0,0.0,3.52
"Eggplant, raw",100.0,25.0,0.2,,0.0,2.0,6.9,22.0,0.0,0.649,0.281,0.037,0.039,23.0,1.0,0.0,14.0,0.0,36.0,0.0,0.0,0.084,2.2,0.0,0.3,0.3,3.5,9.0,0.081,0.23,14.0,0.232,24.0,229.0,0.3,0.16,0.98,0.051,0.057,0.164,0.006,0.186,0.041,0.023,0.0,0.045,0.064,0.047,0.011,0.043,0.043,0.042,0.037,0.009,0.027,0.053,5.88,3.0,3.53,1.54,0.0,1.58,0.0,0.0,0.26,0.18,0.034,0.016,0.076,0.0,0.0,0.66,0.0,0.0,92.3
"Teff, uncooked",100.0,367.0,2.4,0.4,0.0,12.0,13.1,0.0,0.0,3.363,0.942,0.27,0.39,9.0,0.0,0.0,5.0,0.0,66.0,0.0,0.0,0.482,0.0,0.0,0.08,0.08,1.9,180.0,0.81,7.63,184.0,9.24,429.0,427.0,4.4,3.63,13.3,0.747,0.517,0.82,0.236,3.349,0.477,0.301,0.0,0.501,1.068,0.376,0.428,0.698,0.664,0.622,0.51,0.139,0.458,0.686,73.13,8.0,1.84,0.47,0.0,0.73,0.0,0.01,0.62,2.38,0.449,0.589,1.071,0.0,0.0,2.37,0.0,0.0,8.82
"Sherbet, orange",100.0,144.0,2.0,1.2,1.0,46.0,7.7,4.0,0.0,0.063,0.224,0.097,0.027,46.0,12.0,0.0,1.0,5.0,7.0,0.0,0.13,0.023,2.3,0.0,0.01,0.01,0.0,54.0,0.028,0.14,8.0,0.011,40.0,96.0,1.5,0.48,1.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.4,1.3,24.32,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.16,0.53,0.08,1.0,0.0,0.4,0.0,0.0,66.1


Problem solved! Now every column is a float.

In [141]:
nutrition.dtypes.value_counts()

float64    75
dtype: int64

Let's try some basic math to demonstrate.

In [142]:
nutrition.total_fat_g.sum()

92784.20000000001

In [143]:
nutrition.info(verbose = False)

<class 'pandas.core.frame.DataFrame'>
Index: 8789 entries, Cornstarch to Beef, raw, all grades, trimmed to 0" fat, separable lean only, boneless, eye of round steak, round
Columns: 75 entries, serving_size_g to water_g
dtypes: float64(75)
memory usage: 5.4+ MB


## Filtering DataFrames in 2D with `filter()`

When you have very large datasets, it is nice to be able to slice and dice just the index labels or column labels that you are interested in. This can be done with the `filter()` method. It is similar to the Series filter method, but now we can identify both index and columns.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

Suppose we want to scrutinize octopus, since we'be been told it has a lot of cholesterol. Let's search the axis labels for "Octopus"

In [144]:
nutrition.filter(like = 'Octopus', axis=0)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Octopus (Alaska Native),100.0,56.0,0.8,0.2,41.0,0.0,0.0,0.0,0.0,2.0,0.0,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35.0,0.37,4.9,0.0,0.021,158.0,0.0,0.0,1.43,12.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,0.2,0.0,0.2,41.0,0.0,1.5,0.0,0.0,84.0


Remember that the `like` parameter is case-literal. If we change it to lower-case, we'll get a slightly different result.

In [145]:
nutrition.filter(like = 'octopus', axis=0)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Mollusks, raw, common, octopus",100.0,82.0,1.0,0.2,48.0,230.0,65.0,16.0,0.0,2.1,0.5,0.04,0.03,150.0,45.0,0.0,0.0,0.0,0.0,0.0,20.0,0.36,5.0,0.0,1.2,1.2,0.1,53.0,0.435,5.3,30.0,0.025,186.0,350.0,44.8,1.68,14.91,0.902,1.088,1.438,0.196,2.027,0.933,0.286,0.0,0.649,1.049,1.114,0.336,0.534,0.608,0.668,0.642,0.167,0.477,0.651,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.04,0.227,0.162,0.239,48.0,0.0,1.6,0.0,0.0,80.25
"Mollusks, moist heat, cooked, common, octopus",100.0,164.0,2.1,0.5,96.0,460.0,81.0,24.0,0.0,3.78,0.9,0.076,0.057,300.0,90.0,0.0,0.0,0.0,0.0,0.0,36.0,0.648,8.0,0.0,1.2,1.2,0.1,106.0,0.739,9.54,60.0,0.047,279.0,630.0,89.6,3.36,29.82,1.804,2.176,2.877,0.391,4.056,1.866,0.573,0.0,1.298,2.099,2.228,0.673,1.069,1.217,1.336,1.283,0.334,0.954,1.303,4.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.08,0.453,0.324,0.477,96.0,0.0,3.2,0.0,0.0,60.5


What if we want to capture both? Luckily, `filter()` has a `regex` parameter that we can use. We can specify that we want to look for the word "octopus, both lower-case and upper-case"

In [146]:
nutrition.filter(regex = '[oO]ctopus', axis = 0)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Octopus (Alaska Native),100.0,56.0,0.8,0.2,41.0,0.0,0.0,0.0,0.0,2.0,0.0,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35.0,0.37,4.9,0.0,0.021,158.0,0.0,0.0,1.43,12.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,0.2,0.0,0.2,41.0,0.0,1.5,0.0,0.0,84.0
"Mollusks, raw, common, octopus",100.0,82.0,1.0,0.2,48.0,230.0,65.0,16.0,0.0,2.1,0.5,0.04,0.03,150.0,45.0,0.0,0.0,0.0,0.0,0.0,20.0,0.36,5.0,0.0,1.2,1.2,0.1,53.0,0.435,5.3,30.0,0.025,186.0,350.0,44.8,1.68,14.91,0.902,1.088,1.438,0.196,2.027,0.933,0.286,0.0,0.649,1.049,1.114,0.336,0.534,0.608,0.668,0.642,0.167,0.477,0.651,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.04,0.227,0.162,0.239,48.0,0.0,1.6,0.0,0.0,80.25
"Mollusks, moist heat, cooked, common, octopus",100.0,164.0,2.1,0.5,96.0,460.0,81.0,24.0,0.0,3.78,0.9,0.076,0.057,300.0,90.0,0.0,0.0,0.0,0.0,0.0,36.0,0.648,8.0,0.0,1.2,1.2,0.1,106.0,0.739,9.54,60.0,0.047,279.0,630.0,89.6,3.36,29.82,1.804,2.176,2.877,0.391,4.056,1.866,0.573,0.0,1.298,2.099,2.228,0.673,1.069,1.217,1.336,1.283,0.334,0.954,1.303,4.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.08,0.453,0.324,0.477,96.0,0.0,3.2,0.0,0.0,60.5


We can also ask regex to simply ignore cases entirely (this is a bit more advanced regex) by activating case-insensitive mode.
* http://www.rexegg.com/regex-quickstart.html

In [147]:
nutrition.filter(regex = '(?i)octopus', axis = 0)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
Octopus (Alaska Native),100.0,56.0,0.8,0.2,41.0,0.0,0.0,0.0,0.0,2.0,0.0,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35.0,0.37,4.9,0.0,0.021,158.0,0.0,0.0,1.43,12.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,0.2,0.0,0.2,41.0,0.0,1.5,0.0,0.0,84.0
"Mollusks, raw, common, octopus",100.0,82.0,1.0,0.2,48.0,230.0,65.0,16.0,0.0,2.1,0.5,0.04,0.03,150.0,45.0,0.0,0.0,0.0,0.0,0.0,20.0,0.36,5.0,0.0,1.2,1.2,0.1,53.0,0.435,5.3,30.0,0.025,186.0,350.0,44.8,1.68,14.91,0.902,1.088,1.438,0.196,2.027,0.933,0.286,0.0,0.649,1.049,1.114,0.336,0.534,0.608,0.668,0.642,0.167,0.477,0.651,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.04,0.227,0.162,0.239,48.0,0.0,1.6,0.0,0.0,80.25
"Mollusks, moist heat, cooked, common, octopus",100.0,164.0,2.1,0.5,96.0,460.0,81.0,24.0,0.0,3.78,0.9,0.076,0.057,300.0,90.0,0.0,0.0,0.0,0.0,0.0,36.0,0.648,8.0,0.0,1.2,1.2,0.1,106.0,0.739,9.54,60.0,0.047,279.0,630.0,89.6,3.36,29.82,1.804,2.176,2.877,0.391,4.056,1.866,0.573,0.0,1.298,2.099,2.228,0.673,1.069,1.217,1.336,1.283,0.334,0.954,1.303,4.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.08,0.453,0.324,0.477,96.0,0.0,3.2,0.0,0.0,60.5


Can we filter along both dimensions? After all, we care about the cholesterol in octopuses, and maybe serving size and calories, but not much else. Yes, we can do this 2D filtration!

We will chain on another `filter()` method and make use of the `items` parameter where we can pass in a list of things that we want to filter for.

In [148]:
nutrition.filter(regex = '(?i)octopus', axis = 0).filter(items = ['cholesterol_mg', 'serving_size_g', 'calories'], axis = 1)

Unnamed: 0_level_0,cholesterol_mg,serving_size_g,calories
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Octopus (Alaska Native),41.0,100.0,56.0
"Mollusks, raw, common, octopus",48.0,100.0,82.0
"Mollusks, moist heat, cooked, common, octopus",96.0,100.0,164.0


This example was illustrative just to demonstrate the abilities of the `filter()` method. A better way to get the above result would be to chain `loc[]` on the row-filtered DataFrame and then select the columns by label. The end result is the same.

In [149]:
nutrition.filter(regex = '(?i)octopus', axis = 0).loc[:, ['cholesterol_mg', 'serving_size_g', 'calories']]

Unnamed: 0_level_0,cholesterol_mg,serving_size_g,calories
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Octopus (Alaska Native),41.0,100.0,56.0
"Mollusks, raw, common, octopus",48.0,100.0,82.0
"Mollusks, moist heat, cooked, common, octopus",96.0,100.0,164.0


## Sorting with DataFrames: `sort_values()`
Now that our DataFrame is fully numeric, we are free to apply all of those numerical methods that we couldn't before. 

To refresh our memory, let's start by sorting by **vitamin B12**. Remember that when we use the column label as an attribute to a DataFrame. we get a Series back that describes the column.

In [150]:
nutrition.vitamin_b12_mcg

name
Cornstarch                                                                                            0.00
Nuts, pecans                                                                                          0.00
Eggplant, raw                                                                                         0.00
Teff, uncooked                                                                                        0.00
Sherbet, orange                                                                                       0.13
                                                                                                      ... 
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round    1.64
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand    2.95
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand    2.42
Beef, raw, all grades, trimmed t

Now let's sort by the vitamin B12 content values using `.sort_values()`

In [151]:
nutrition.vitamin_b12_mcg.sort_values()

name
Cornstarch                                                                           0.00
Apricots, stewed, sulfured, dehydrated (low-moisture)                                0.00
Cocoa, processed with alkali, unsweetened, dry powder                                0.00
Tomato products, with herbs and cheese, sauce, canned                                0.00
Mothbeans, without salt, boiled, cooked, mature seeds                                0.00
                                                                                    ...  
Veal, braised, cooked, liver, variety meats and by-products                         84.60
Lamb, pan-fried, cooked, liver, variety meats and by-products                       85.70
Lamb, raw, liver, variety meats and by-products                                     90.05
Beef, boiled, cooked, variety meats and by-products liver, imported, New Zealand    96.00
Mollusks, moist heat, cooked, mixed species, clam                                   98.89
Name:

`sort_values()` also works on DataFrames as a whole. However, since DataFrames are two-dimensional, we must provide the *axis* to sort by as well as the axis names to sort by. Otherwise the method will throw an error.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

In our example, we will sort the rows (indexes) by caloric content. 

In [152]:
nutrition.sort_values(axis = 0, by = 'calories', ascending=False).head()

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Fat, mutton tallow",100.0,902.0,100.0,47.0,102.0,0.0,79.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,2.8,2.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,47.3,40.6,7.8,102.0,0.0,0.0,0.0,0.0,0.0
"Fish oil, salmon",100.0,902.0,100.0,20.0,485.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,19.872,29.037,40.324,485.0,0.0,0.0,0.0,0.0,0.0
Lard,100.0,902.0,100.0,39.0,95.0,0.0,49.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,102.0,0.6,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,39.2,45.1,11.2,95.0,0.0,0.0,0.0,0.0,0.0
"Fat, beef tallow",100.0,902.0,100.0,50.0,109.0,0.0,79.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,2.7,2.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,49.8,41.8,4.0,109.0,0.0,0.0,0.0,0.0,0.0
"Fish oil, cod liver",100.0,902.0,100.0,23.0,570.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100000.0,30000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,22.608,46.711,22.541,570.0,0.0,0.0,0.0,0.0,0.0


Unsurprisingly, the highest-calorie foods have the most fat. Not very interesting. Let's sort by something else, like **sodium**, along with cholesterol. We can do this by passing in a list of column labels to sort by.

This constitutes **sorting by two columns at once**. 

In [153]:
nutrition.sort_values(axis = 0, by = ['cholesterol_mg', 'sodium_mg'], ascending=False).head()

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Veal, braised, cooked, brain, variety meats and by-products",100.0,136.0,9.6,2.2,3100.0,156.0,0.0,3.0,0.0,2.43,1.0,0.2,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.65,0.17,13.0,0.0,0.0,0.0,0.0,16.0,0.26,1.67,16.0,0.038,385.0,214.0,11.0,1.61,11.48,0.591,0.629,0.974,0.12,1.373,0.504,0.287,0.0,0.467,0.886,0.711,0.252,0.604,0.474,0.589,0.568,0.115,0.445,0.546,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.63,2.18,1.74,1.49,3100.0,0.0,1.4,0.0,0.0,76.89
"Beef, simmered, cooked, brain, variety meats and by-products",100.0,151.0,11.0,2.4,3100.0,108.0,490.9,5.0,0.0,3.62,1.21,0.217,0.069,117.0,6.0,0.0,70.0,0.0,0.0,0.0,10.1,0.143,10.5,0.0,1.67,1.67,0.1,9.0,0.23,2.3,12.0,0.028,335.0,244.0,21.8,1.09,11.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.48,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.53,2.394,1.882,1.632,3100.0,0.0,1.46,0.0,0.0,74.86
"Beef, raw, brain, variety meats and by-products",100.0,143.0,10.0,2.3,3010.0,126.0,0.0,3.0,0.0,3.55,2.01,0.199,0.092,147.0,7.0,0.0,88.0,0.0,0.0,0.0,9.51,0.226,10.7,0.0,0.99,0.99,0.0,43.0,0.287,2.55,13.0,0.026,362.0,274.0,21.3,1.02,10.86,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.3,2.3,1.89,1.586,3010.0,0.0,1.51,0.0,0.0,76.29
"Lamb, soaked and fried, cooked, brains, imported, New Zealand",100.0,154.0,11.0,1.4,2559.0,101.0,0.0,0.0,0.0,2.995,1.906,0.219,0.084,8.0,2.0,0.0,0.0,0.0,0.0,0.0,9.54,0.081,0.0,0.0,1.12,1.12,0.0,6.0,0.283,1.21,15.0,0.036,384.0,258.0,16.3,1.33,14.03,0.697,0.944,0.0,0.184,1.96,0.6,0.297,0.0,0.628,1.083,1.225,0.458,0.553,0.45,0.478,0.669,0.155,0.486,0.714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.92,1.365,4.168,0.999,2559.0,0.0,3.39,0.0,0.0,73.11
"Pork, braised, cooked, brain, variety meats and by-products, fresh",100.0,138.0,9.5,2.2,2552.0,91.0,0.0,4.0,0.0,3.33,1.823,0.223,0.078,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.42,0.14,14.0,0.0,0.0,0.0,0.0,9.0,0.263,1.82,12.0,0.085,220.0,195.0,18.5,1.48,12.14,0.66,0.635,1.214,0.214,1.42,0.583,0.326,0.0,0.561,1.058,0.954,0.241,0.618,0.66,0.641,0.567,0.155,0.509,0.691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.51,2.15,1.72,1.47,2552.0,0.0,1.4,0.0,0.0,75.88


What's actually happening here when we sort by multiple columns? By default, the directioanality chosen by the `ascending` parameter applies to both columns. In that case, the first column "cholesterol_mg", is sorted first, and then second column "sodium_mg" is sorted. That means you may have a food with lower cholesterol but higher sodium, sicne cholesterol is first in the sort priority.

What if we wanted to sort two or more columns in different directions? We can do that by passing in a list of booleans to the `ascending` parameter!

In [154]:
nutrition.sort_values(axis = 0, by = ['cholesterol_mg', 'sodium_mg'], ascending=[False, True]).head(5)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Beef, simmered, cooked, brain, variety meats and by-products",100.0,151.0,11.0,2.4,3100.0,108.0,490.9,5.0,0.0,3.62,1.21,0.217,0.069,117.0,6.0,0.0,70.0,0.0,0.0,0.0,10.1,0.143,10.5,0.0,1.67,1.67,0.1,9.0,0.23,2.3,12.0,0.028,335.0,244.0,21.8,1.09,11.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.48,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.53,2.394,1.882,1.632,3100.0,0.0,1.46,0.0,0.0,74.86
"Veal, braised, cooked, brain, variety meats and by-products",100.0,136.0,9.6,2.2,3100.0,156.0,0.0,3.0,0.0,2.43,1.0,0.2,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.65,0.17,13.0,0.0,0.0,0.0,0.0,16.0,0.26,1.67,16.0,0.038,385.0,214.0,11.0,1.61,11.48,0.591,0.629,0.974,0.12,1.373,0.504,0.287,0.0,0.467,0.886,0.711,0.252,0.604,0.474,0.589,0.568,0.115,0.445,0.546,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.63,2.18,1.74,1.49,3100.0,0.0,1.4,0.0,0.0,76.89
"Beef, raw, brain, variety meats and by-products",100.0,143.0,10.0,2.3,3010.0,126.0,0.0,3.0,0.0,3.55,2.01,0.199,0.092,147.0,7.0,0.0,88.0,0.0,0.0,0.0,9.51,0.226,10.7,0.0,0.99,0.99,0.0,43.0,0.287,2.55,13.0,0.026,362.0,274.0,21.3,1.02,10.86,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.3,2.3,1.89,1.586,3010.0,0.0,1.51,0.0,0.0,76.29
"Lamb, soaked and fried, cooked, brains, imported, New Zealand",100.0,154.0,11.0,1.4,2559.0,101.0,0.0,0.0,0.0,2.995,1.906,0.219,0.084,8.0,2.0,0.0,0.0,0.0,0.0,0.0,9.54,0.081,0.0,0.0,1.12,1.12,0.0,6.0,0.283,1.21,15.0,0.036,384.0,258.0,16.3,1.33,14.03,0.697,0.944,0.0,0.184,1.96,0.6,0.297,0.0,0.628,1.083,1.225,0.458,0.553,0.45,0.478,0.669,0.155,0.486,0.714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.92,1.365,4.168,0.999,2559.0,0.0,3.39,0.0,0.0,73.11
"Pork, braised, cooked, brain, variety meats and by-products, fresh",100.0,138.0,9.5,2.2,2552.0,91.0,0.0,4.0,0.0,3.33,1.823,0.223,0.078,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.42,0.14,14.0,0.0,0.0,0.0,0.0,9.0,0.263,1.82,12.0,0.085,220.0,195.0,18.5,1.48,12.14,0.66,0.635,1.214,0.214,1.42,0.583,0.326,0.0,0.561,1.058,0.954,0.241,0.618,0.66,0.641,0.567,0.155,0.509,0.691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.51,2.15,1.72,1.47,2552.0,0.0,1.4,0.0,0.0,75.88


Comparied to the unidirectional sorting, we see that the bi-directionary sorting has shifted *beef brain* to be the highest ranked, followed by *veal brain*. Beef brain has the same amount of cholesterol as veal brain, but less sodium. Cool!

## Using Series `between()` with DataFrames

The `between()` method gives us a convenient way to specify a range of values that we want. It is a **Series** method, so it must be applied to a 1-dimensional Series slice of our DataFrame
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.between.html
* The method has you specify the boundaries, and by default it is inclusive of both boundary numbers.

Let's play around with **calories**

In [155]:
nutrition.calories.head(10)

name
Cornstarch            381.0
Nuts, pecans          691.0
Eggplant, raw          25.0
Teff, uncooked        367.0
Sherbet, orange       144.0
Cauliflower, raw       25.0
Taro leaves, raw       42.0
Lamb, raw, ground     282.0
Cheese, camembert     300.0
Vegetarian fillets    290.0
Name: calories, dtype: float64

Let's say we're looking for a low-calorie food, say between 20 to 60 calories per 100g serving (remember that the `nutrition` DataFrame is standardized to 100g). We do this by selecting the `calories` attribute (thus getting a Series) and then specifying the range of calories we want.

In [156]:
nutrition.calories.between(20, 60)

name
Cornstarch                                                                                            False
Nuts, pecans                                                                                          False
Eggplant, raw                                                                                          True
Teff, uncooked                                                                                        False
Sherbet, orange                                                                                       False
                                                                                                      ...  
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round    False
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Beef, raw, all grades, 

What returns is a **Boolean series** that is the same length as the `calories` Series, which tells us whether each food is within the range of 20-60 calories. Now we can use this as a boolean mask to filter for low-calorie foods on the original DataFrame! Any food corresponding to a `True` in the boolean mask will be kept.

In [157]:
nutrition[nutrition.calories.between(20, 60)].head()

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Eggplant, raw",100.0,25.0,0.2,,0.0,2.0,6.9,22.0,0.0,0.649,0.281,0.037,0.039,23.0,1.0,0.0,14.0,0.0,36.0,0.0,0.0,0.084,2.2,0.0,0.3,0.3,3.5,9.0,0.081,0.23,14.0,0.232,24.0,229.0,0.3,0.16,0.98,0.051,0.057,0.164,0.006,0.186,0.041,0.023,0.0,0.045,0.064,0.047,0.011,0.043,0.043,0.042,0.037,0.009,0.027,0.053,5.88,3.0,3.53,1.54,0.0,1.58,0.0,0.0,0.26,0.18,0.034,0.016,0.076,0.0,0.0,0.66,0.0,0.0,92.3
"Cauliflower, raw",100.0,25.0,0.3,0.1,0.0,30.0,44.3,57.0,0.0,0.507,0.667,0.06,0.05,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.184,48.2,0.0,0.08,0.08,15.5,22.0,0.039,0.42,15.0,0.155,44.0,299.0,0.6,0.27,1.92,0.116,0.086,0.177,0.02,0.257,0.071,0.056,0.0,0.071,0.106,0.217,0.02,0.065,0.071,0.086,0.076,0.02,0.051,0.125,4.97,2.0,1.91,0.97,0.0,0.94,0.0,0.0,0.0,0.28,0.13,0.034,0.031,0.0,0.0,0.76,0.0,0.0,92.07
"Taro leaves, raw",100.0,42.0,0.7,0.2,0.0,3.0,12.8,126.0,0.0,1.513,0.084,0.456,0.209,4825.0,241.0,0.0,2895.0,0.0,1932.0,0.0,0.0,0.146,52.0,0.0,2.02,2.02,108.6,107.0,0.27,2.25,45.0,0.714,60.0,648.0,0.9,0.41,4.98,0.0,0.22,0.0,0.064,0.0,0.0,0.114,0.0,0.26,0.392,0.246,0.079,0.195,0.0,0.0,0.167,0.048,0.178,0.256,6.7,3.7,3.01,0.0,0.0,0.0,0.0,0.0,0.0,0.74,0.151,0.06,0.307,0.0,0.0,1.92,0.0,0.0,85.66
"PACE, Picante Sauce",100.0,25.0,0.0,,0.0,781.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,313.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.25,3.1,6.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.85,0.0,0.0,89.9
"Mango nectar, canned",100.0,51.0,0.1,,0.0,5.0,1.5,7.0,0.0,0.08,0.07,0.003,0.003,692.0,35.0,0.0,402.0,26.0,0.0,0.0,0.0,0.015,15.2,0.0,0.21,0.21,0.8,17.0,0.015,0.36,3.0,0.028,2.0,24.0,0.4,0.02,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13.12,0.3,12.45,5.56,0.0,5.31,0.0,0.56,1.02,0.06,0.014,0.022,0.011,0.0,0.0,0.08,0.0,0.0,86.63


The returned DataFrame is over 1000 lines, corresponding to 1000 `True`s in the boolean mask. We can make this more manageable by using `sample()`.

In [158]:
nutrition[nutrition.calories.between(20, 60)].sample(4)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Radicchio, raw",100.0,23.0,0.3,0.1,0.0,22.0,10.9,60.0,0.0,0.255,0.269,0.028,0.016,27.0,1.0,0.0,16.0,0.0,8832.0,0.0,0.0,0.057,8.0,0.0,2.26,2.26,255.2,19.0,0.341,0.57,13.0,0.138,40.0,302.0,0.9,0.62,1.43,0.0,0.105,0.0,0.0,0.0,0.0,0.024,0.0,0.085,0.062,0.056,0.008,0.034,0.0,0.0,0.04,0.026,0.0,0.065,4.48,0.9,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.06,0.01,0.11,0.0,0.0,0.7,0.0,0.0,93.14
"Nuts, boiled and steamed, japanese, chestnuts",100.0,56.0,0.2,,0.0,5.0,0.0,17.0,0.0,0.543,0.075,0.059,0.125,13.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102,9.5,0.0,0.0,0.0,0.0,11.0,0.204,0.53,18.0,0.576,26.0,119.0,0.0,0.4,0.82,0.073,0.054,0.172,0.024,0.156,0.041,0.02,0.0,0.04,0.051,0.053,0.02,0.032,0.051,0.04,0.033,0.012,0.023,0.049,12.64,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.19,0.028,0.101,0.05,0.0,0.0,0.33,0.0,0.0,86.03
"Orange juice, with added calcium, diluted with 3 volume water, unsweetened, frozen concentrate",100.0,37.0,0.1,,0.0,4.0,5.0,19.0,0.0,0.273,0.0,0.044,0.069,66.0,3.0,5.0,13.0,48.0,83.0,0.0,0.0,0.065,36.2,0.0,0.15,0.15,0.1,147.0,0.021,0.08,10.0,0.0,78.0,158.0,0.1,0.04,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.47,0.2,7.42,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.015,0.012,0.016,0.0,0.0,0.81,0.0,0.0,90.07
"Beverages, prepared with water and ice, powder, with aspartame, reduced calorie, chocolate, dairy drink mix",100.0,29.0,0.2,0.2,2.0,61.0,0.0,3.0,0.0,0.11,0.188,0.17,0.01,61.0,18.0,0.0,0.0,0.0,1.0,0.0,0.18,0.01,0.5,0.0,0.0,0.0,0.1,127.0,0.079,0.68,19.0,0.064,78.0,197.0,1.9,0.32,2.19,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.51,0.8,2.89,0.0,0.0,0.0,0.0,0.0,0.0,0.23,0.164,0.043,0.005,2.0,0.0,0.81,2.0,73.0,92.26


## BONUS - `min()`, `max()`, Idx[Min Max], and Good Foods

For DataFrames, we can use the `min()` and `max()` methods to find row-wise and column-wise minima and maxima.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.max.html
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.min.html

By default, calling `min()` on a DataFrame will give you the minimum value for each column.

In [159]:
nutrition.min()

serving_size_g     100.0
calories             0.0
total_fat_g          0.0
saturated_fat_g      0.1
cholesterol_mg       0.0
                   ...  
alcohol_g            0.0
ash_g                0.0
caffeine_mg          0.0
theobromine_mg       0.0
water_g              0.0
Length: 75, dtype: float64

We can also shift the axis to give us the minimum value across rows. As we can see, all of the rows (foods) have at least one characteristic whose value is zero.

In [160]:
nutrition.min(axis = 1).head(5)

name
Cornstarch         0.0
Nuts, pecans       0.0
Eggplant, raw      0.0
Teff, uncooked     0.0
Sherbet, orange    0.0
dtype: float64

That wasn't very practical. But we can use these methods to answer targeted questions. For example, say we want to know which food has the most potassium. We can do the following:

In [161]:
nutrition.potassium_mg.max()

16500.0

Which food is this? Let's use `idxmax()` to find out.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html

In [162]:
nutrition.potassium_mg.idxmax()

'Leavening agents, cream of tartar'

Nobody is going to eat 100g of cream of tartar in one sitting :)

Let's take a look at the potassium values for all foods and sort them:

In [163]:
nutrition.potassium_mg.sort_values(ascending=False).head(10)

name
Leavening agents, cream of tartar                         16500.0
Leavening agents, low-sodium, baking powder               10100.0
Parsley, freeze-dried                                      6300.0
Beverages, powder, unsweetened, instant, tea               6040.0
Beverages, unsweetened, decaffeinated, instant, tea        6040.0
Spices, dried, chervil                                     4740.0
Spices, dried, coriander leaf                              4466.0
Celery flakes, dried                                       4388.0
Beverages, half the caffeine, regular, instant, coffee     3535.0
Beverages, powder, regular, instant, coffee                3535.0
Name: potassium_mg, dtype: float64

According to Harvard Health Publishing, the sodium-to-potassium ratio that reflects what our ancestors ate is 1 to 16. Let's find out which food in our list comes closest to this ratio.

Start by dividing our potassium column by our sodium column. Remember that some foods have 0 potassium or 0 sodium, so we'll replace all zeroes with 1s for both Series to avoid division by 0 errors.

In [164]:
K_to_Na = (nutrition.potassium_mg).replace(0,1) / (nutrition.sodium_mg).replace(0,1)
K_to_Na

name
Cornstarch                                                                                              0.333333
Nuts, pecans                                                                                          410.000000
Eggplant, raw                                                                                         114.500000
Teff, uncooked                                                                                         35.583333
Sherbet, orange                                                                                         2.086957
                                                                                                         ...    
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round      5.759259
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand      3.760000
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New 

Great, let's now sort the result in descending order.

In [165]:
K_to_Na = K_to_Na.sort_values(ascending=False)
K_to_Na

name
Peanut flour, low fat                                         1358.000000
Nuts, raw, pistachio nuts                                     1025.000000
Beverages, reduced calorie, with whitener, instant, coffee     909.000000
Soybeans, raw, mature seeds                                    898.500000
Soy meal, raw, defatted                                        830.000000
                                                                 ...     
Seasoning mix, original, chili, dry                              0.000217
Salt, table                                                      0.000206
PACE, Dry Taco Seasoning Mix                                     0.000124
Seasoning mix, coriander & annatto, sazon, dry                   0.000059
Leavening agents, baking soda                                    0.000037
Length: 8789, dtype: float64

Looks like nuts and soy products have a large amount of potassium relative to sodium. Which foods sit around the golden ratio of 16 potassium to 1 sodium?

To find this, let's do some indexing using the `between()` method. We will apply the `between()` method to determine which foods have the ratio of 14 to 18 (16 plus or minus 2), subset the DataFrame to include those foods only, and then take a sample from it.

In [166]:
K_to_Na[K_to_Na.between(14,18)].sample(15)

name
Babyfood, mango with tapioca, fruit dessert                                        16.500000
Fish, raw, wild, rainbow, trout                                                    15.516129
Alcoholic beverage, white, table, wine                                             14.200000
Tomato products, without salt added, puree, canned                                 15.678571
Beans, without salt, drained, boiled, cooked, frozen, yellow, snap                 14.000000
Fish, raw, spot                                                                    17.103448
Beverages, high vitamin C, greater than 3% juice, fruit juice drink                15.250000
Corn, raw, white, sweet                                                            18.000000
Tomato products, without salt added, paste, canned                                 17.186441
Asparagus, drained, boiled, cooked                                                 16.000000
Peppers, freeze-dried, red, sweet                                

Cool stuff! With some exceptions, the foods on this list appear to be ones that we would typically think of as healthy, such as vegetables, fish, and beans.

## DataFrame `nlargest()` and `nsmallest()` Methods

Recall in the previous lecture that we found the foods with the highest potassium by using the `sort_values()` method and chaining on the `head()` method. That's fine.

However, an easier way to do this is to apply the `nlargest()` method. This is similar to the method that we saw in Series, but in this case we can use it on DataFrames as well.
* Remember that since DataFrames are two-dimensional, you must specify which column(s) you want to find the largest values for.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nlargest.html

In [170]:
nutrition.nlargest(n = 10, columns = 'potassium_mg')

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Leavening agents, cream of tartar",100.0,258.0,0.0,,0.0,52.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.195,3.72,2.0,0.205,5.0,16500.0,0.2,0.42,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,61.5,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,36.8,0.0,0.0,1.7
"Leavening agents, low-sodium, baking powder",100.0,97.0,0.4,0.1,0.0,90.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4332.0,0.019,8.17,29.0,0.42,6869.0,10100.0,0.2,0.72,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.9,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.073,0.006,0.121,0.0,0.0,46.4,0.0,0.0,6.2
"Parsley, freeze-dried",100.0,271.0,5.2,,0.0,391.0,0.0,194.0,0.0,10.4,2.516,2.26,1.04,63240.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.375,149.0,0.0,0.0,0.0,0.0,176.0,0.459,53.9,372.0,1.338,548.0,6300.0,32.3,6.11,31.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.115,0.21,0.0,0.0,0.0,0.0,0.516,0.0,0.0,42.38,32.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.2,0.0,0.0,0.0,0.0,0.0,19.12,0.0,0.0,2.0
"Beverages, powder, unsweetened, instant, tea",100.0,315.0,0.0,,0.0,72.0,118.3,103.0,0.0,10.8,4.53,0.985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356,0.0,0.0,0.0,0.0,0.0,118.0,0.55,2.26,272.0,133.0,239.0,6040.0,5.3,1.69,20.21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,58.66,8.5,5.53,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.04,5714.0,71.0,5.09
"Beverages, unsweetened, decaffeinated, instant, tea",100.0,315.0,0.0,,0.0,72.0,118.3,103.0,0.0,10.8,4.53,0.985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356,0.0,0.0,0.0,0.0,0.0,118.0,0.55,2.26,272.0,133.0,239.0,6040.0,5.3,1.69,20.21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,58.66,8.5,5.53,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.04,169.0,11.0,5.09
"Spices, dried, chervil",100.0,237.0,3.9,0.2,0.0,83.0,0.0,274.0,0.0,5.4,0.0,0.68,0.38,5850.0,293.0,0.0,0.0,0.0,0.0,0.0,0.0,0.93,50.0,0.0,0.0,0.0,0.0,1346.0,0.44,31.95,130.0,2.1,450.0,4740.0,29.3,8.8,23.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,49.1,11.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.9,0.169,1.399,1.8,0.0,0.0,16.6,0.0,0.0,7.2
"Spices, dried, coriander leaf",100.0,279.0,4.8,0.1,0.0,211.0,97.1,274.0,0.0,10.707,0.0,1.5,1.252,5850.0,293.0,31.0,3407.0,175.0,2428.0,0.0,0.0,0.61,566.7,0.0,1.03,1.03,1359.5,1246.0,1.786,42.46,694.0,6.355,481.0,4466.0,29.3,4.72,21.93,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,52.1,10.4,7.27,0.0,0.0,0.0,0.0,0.0,0.0,4.78,0.115,2.232,0.328,0.0,0.0,14.08,0.0,0.0,7.3
"Celery flakes, dried",100.0,319.0,2.1,0.6,0.0,1435.0,122.3,107.0,0.0,4.64,0.0,0.5,0.44,1962.0,98.0,0.0,1177.0,0.0,5076.0,0.0,0.0,0.46,86.5,0.0,5.55,5.55,584.2,587.0,0.571,7.83,196.0,0.0,402.0,4388.0,15.3,2.77,11.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,63.7,27.8,35.9,0.0,0.0,0.0,0.0,0.0,0.0,2.1,0.555,0.405,1.035,0.0,0.0,13.9,0.0,0.0,9.0
"Beverages, powder, regular, instant, coffee",100.0,353.0,0.5,0.2,0.0,37.0,101.9,0.0,0.0,28.173,0.097,0.074,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029,0.0,0.0,0.0,0.0,1.9,141.0,0.139,4.41,327.0,1.712,303.0,3535.0,12.6,0.35,12.2,0.335,0.053,0.478,0.202,2.03,0.441,0.165,0.0,0.172,0.478,0.096,0.023,0.262,0.351,0.126,0.142,0.03,0.165,0.276,75.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.197,0.041,0.196,0.0,0.0,8.8,3142.0,0.0,3.1
"Beverages, half the caffeine, regular, instant, coffee",100.0,352.0,0.5,0.2,0.0,37.0,101.9,0.0,0.0,28.173,0.097,0.074,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029,0.0,0.0,0.0,0.0,1.9,141.0,0.139,4.41,327.0,1.712,303.0,3535.0,12.6,0.35,14.42,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73.18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.197,0.041,0.196,0.0,0.0,8.8,1571.0,0.0,3.1


Note that this returned a DataFrame that contains all columns, sorted in descending order by amount of potassium. We can reduce the output to just the **potassium_mg** column by using the column name attribute.


In [171]:
nutrition.nlargest(n = 10, columns = 'potassium_mg').potassium_mg

name
Leavening agents, cream of tartar                         16500.0
Leavening agents, low-sodium, baking powder               10100.0
Parsley, freeze-dried                                      6300.0
Beverages, powder, unsweetened, instant, tea               6040.0
Beverages, unsweetened, decaffeinated, instant, tea        6040.0
Spices, dried, chervil                                     4740.0
Spices, dried, coriander leaf                              4466.0
Celery flakes, dried                                       4388.0
Beverages, powder, regular, instant, coffee                3535.0
Beverages, half the caffeine, regular, instant, coffee     3535.0
Name: potassium_mg, dtype: float64

`.loc` would also work

In [172]:
nutrition.nlargest(n = 10, columns = 'potassium_mg')['potassium_mg']

name
Leavening agents, cream of tartar                         16500.0
Leavening agents, low-sodium, baking powder               10100.0
Parsley, freeze-dried                                      6300.0
Beverages, powder, unsweetened, instant, tea               6040.0
Beverages, unsweetened, decaffeinated, instant, tea        6040.0
Spices, dried, chervil                                     4740.0
Spices, dried, coriander leaf                              4466.0
Celery flakes, dried                                       4388.0
Beverages, powder, regular, instant, coffee                3535.0
Beverages, half the caffeine, regular, instant, coffee     3535.0
Name: potassium_mg, dtype: float64

But this seems sort of verbose, does it not? Shouldn't there be an easier way to do this? 

Yes, what we can do is simply access the attribute (potassium in this case) to get a Series, and then use `nlargest()` on that Series.

In [175]:
nutrition.potassium_mg.nlargest(10)

name
Leavening agents, cream of tartar                         16500.0
Leavening agents, low-sodium, baking powder               10100.0
Parsley, freeze-dried                                      6300.0
Beverages, powder, unsweetened, instant, tea               6040.0
Beverages, unsweetened, decaffeinated, instant, tea        6040.0
Spices, dried, chervil                                     4740.0
Spices, dried, coriander leaf                              4466.0
Celery flakes, dried                                       4388.0
Beverages, powder, regular, instant, coffee                3535.0
Beverages, half the caffeine, regular, instant, coffee     3535.0
Name: potassium_mg, dtype: float64

The exact same mechanics apply with the `nsmallest()` method. In this example, we will pass in multiple columns that we want to sort by. We will have method order by multiple columns and return the sorted DataFrame
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nsmallest.html

In [180]:
nutrition.nsmallest(n = 5, columns = ['sodium_mg', 'calories', 'folate_mcg'])

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Beverages, well, tap, water",100.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,99.9
"Water, NAYA, non-carbonated, bottled",100.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,100.0
"Beverages, decaffeinated, brewed, green, tea",100.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.058,0.007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005,0.0,0.0,0.0,0.0,0.0,0.0,0.005,0.0,1.0,0.0,0.0,15.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,99.93
"Beverages, EVIAN, non-carbonated, bottled, water",100.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,99.97
"Beverages, CALISTOGA, non-carbonated, bottled, water",100.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,100.0


Always consider using `nsmallest()` or `nlargest()` if you want to quickly get the smallest and largest values for specific columns. They are optimized to perform very quickly, more so than doing the sorting yourself and having to chain multiple methods.

## Skill Challenge: Vitamin B12

Instructions
1.   Find the 10 foods that have the most Vitamin B12. What do these foods have in common?
2.   Isolate the foods in the dataset that contain, or are based on, eggplant. Which of them has the most sodium?
3. Select a slice of the DataFrame that contains 4 random rows and 2 random columns.


### Part 1
The top 10 foods with the most Vitamin B12 can be found using the `nlargest()` method.

In [185]:
nutrition.vitamin_b12_mcg.nlargest(n=10)

name
Mollusks, moist heat, cooked, mixed species, clam                                   98.89
Beef, boiled, cooked, variety meats and by-products liver, imported, New Zealand    96.00
Lamb, raw, liver, variety meats and by-products                                     90.05
Lamb, pan-fried, cooked, liver, variety meats and by-products                       85.70
Veal, braised, cooked, liver, variety meats and by-products                         84.60
Beef, raw, liver, variety meats and by-products, imported, New Zealand              84.50
Beef, pan-fried, cooked, liver, variety meats and by-products                       83.13
Lamb, braised, cooked, kidneys, variety meats and by-products                       78.90
Lamb, braised, cooked, liver, variety meats and by-products                         76.50
Veal, pan-fried, cooked, liver, variety meats and by-products                       72.50
Name: vitamin_b12_mcg, dtype: float64

Alternative and equivalent approach

In [199]:
nutrition.loc[:, 'vitamin_b12_mcg'].nlargest(n=10)

name
Mollusks, moist heat, cooked, mixed species, clam                                   98.89
Beef, boiled, cooked, variety meats and by-products liver, imported, New Zealand    96.00
Lamb, raw, liver, variety meats and by-products                                     90.05
Lamb, pan-fried, cooked, liver, variety meats and by-products                       85.70
Veal, braised, cooked, liver, variety meats and by-products                         84.60
Beef, raw, liver, variety meats and by-products, imported, New Zealand              84.50
Beef, pan-fried, cooked, liver, variety meats and by-products                       83.13
Lamb, braised, cooked, kidneys, variety meats and by-products                       78.90
Lamb, braised, cooked, liver, variety meats and by-products                         76.50
Veal, pan-fried, cooked, liver, variety meats and by-products                       72.50
Name: vitamin_b12_mcg, dtype: float64

It appears that most of these foods are meat-based and likely protein-rich.

### Part 2
Eggplant-based foods can be found using the `filter()` method with a regular expression.

In [192]:
nutrition.filter(regex = '[eE]ggplant', axis = 0)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Eggplant, raw",100.0,25.0,0.2,,0.0,2.0,6.9,22.0,0.0,0.649,0.281,0.037,0.039,23.0,1.0,0.0,14.0,0.0,36.0,0.0,0.0,0.084,2.2,0.0,0.3,0.3,3.5,9.0,0.081,0.23,14.0,0.232,24.0,229.0,0.3,0.16,0.98,0.051,0.057,0.164,0.006,0.186,0.041,0.023,0.0,0.045,0.064,0.047,0.011,0.043,0.043,0.042,0.037,0.009,0.027,0.053,5.88,3.0,3.53,1.54,0.0,1.58,0.0,0.0,0.26,0.18,0.034,0.016,0.076,0.0,0.0,0.66,0.0,0.0,92.3
"Eggplant, pickled",100.0,49.0,0.7,0.1,0.0,1674.0,11.9,20.0,0.0,0.66,0.0,0.07,0.05,50.0,3.0,0.0,30.0,0.0,0.0,0.0,0.0,0.14,0.0,0.0,0.03,0.03,3.7,25.0,0.173,0.77,6.0,0.0,9.0,12.0,0.6,0.23,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.77,2.5,4.8,0.0,0.0,0.0,0.0,0.0,0.0,0.7,0.14,0.063,0.294,0.0,0.0,1.73,0.0,0.0,86.9
"Eggplant, with salt, drained, boiled, cooked",100.0,33.0,0.2,,0.0,239.0,9.4,14.0,0.0,0.6,0.075,0.02,0.076,37.0,2.0,0.0,22.0,0.0,0.0,0.0,0.0,0.086,1.3,0.0,0.41,0.41,2.9,6.0,0.059,0.25,11.0,0.113,15.0,123.0,0.1,0.12,0.83,0.042,0.046,0.134,0.004,0.152,0.033,0.019,0.0,0.036,0.052,0.039,0.009,0.035,0.034,0.034,0.03,0.008,0.022,0.043,8.14,2.5,3.2,0.0,0.0,0.0,0.0,0.0,0.0,0.23,0.044,0.02,0.093,0.0,0.0,1.13,0.0,0.0,89.67
"Eggplant, without salt, drained, boiled, cooked",100.0,35.0,0.2,,0.0,1.0,9.4,14.0,0.0,0.6,0.075,0.02,0.076,37.0,2.0,0.0,22.0,0.0,0.0,0.0,0.0,0.086,1.3,0.0,0.41,0.41,2.9,6.0,0.059,0.25,11.0,0.113,15.0,123.0,0.1,0.12,0.83,0.042,0.046,0.134,0.004,0.152,0.033,0.019,0.0,0.036,0.052,0.039,0.009,0.035,0.034,0.034,0.03,0.008,0.022,0.043,8.73,2.5,3.2,0.0,0.0,0.0,0.0,0.0,0.0,0.23,0.044,0.02,0.093,0.0,0.0,0.54,0.0,0.0,89.67


To find the one with the most sodium, we can simply chain on `sort_values()` by *sodium_mg*, sorting by descending order, and selecting the first entry.

In [201]:
nutrition.filter(regex = '[eE]ggplant', axis = 0).sort_values(by = 'sodium_mg', ascending=False).head(1)

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Eggplant, pickled",100.0,49.0,0.7,0.1,0.0,1674.0,11.9,20.0,0.0,0.66,0.0,0.07,0.05,50.0,3.0,0.0,30.0,0.0,0.0,0.0,0.0,0.14,0.0,0.0,0.03,0.03,3.7,25.0,0.173,0.77,6.0,0.0,9.0,12.0,0.6,0.23,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.77,2.5,4.8,0.0,0.0,0.0,0.0,0.0,0.0,0.7,0.14,0.063,0.294,0.0,0.0,1.73,0.0,0.0,86.9


The eggplant-based food with the most sodium is pickled eggplant. Makes sense, doesn't it?

### Part 3


Slicing four random rows and two random columns is a simple matter of chaining the `sample()` method. One call will affect rows and the second call will affect columns.

In [202]:
nutrition.sample(n = 4, axis = 0).sample(n = 2, axis = 1)

Unnamed: 0_level_0,lysine_g,lutein_zeaxanthin_mcg
name,Unnamed: 1_level_1,Unnamed: 2_level_1
"Mollusks, moist heat, cooked, Pacific, oyster",1.412,0.0
"Pizza, cooked, frozen, rising crust, cheese topping",0.649,24.0
"Soup, canned, HEALTHY CHOICE Chicken Noodle Soup",0.0,0.0
"Turkey, raw, meat and skin, whole",1.746,0.0


## Skill Challenge: More Food Stuff
1. Remove all food items that contain at least one NaN. Do this in a way that modifies the DataFrame (save that DataFrame). Determine how many food items remain after exclusions.
2. From the remaining records, isolate those that have between 20 and 40 mg of Vitamin C. Of these foods, which is the one with the least amount of calories?
3. Determine how many food items in the DataFrame have Vitamin C levels between 2 and 3 standard deviations (inclusive) above the mean?

### Part 1
We will use the `dropna()` method to to drop all rows that have one or more NaN value.

In [203]:
nutrition_non_NA = nutrition.dropna(axis = 0, how = 'any')

Then we use the `shape` attribute to determine how many rows remain.

In [206]:
nutrition_non_NA.shape

(7199, 75)

There are still 7199 records after dropping all foods with at least one `NaN` value.

### Part 2
To isolate the records between 20 and 40 mg of Vitamin c, we will use the `between()` method. Remember that it only works with Series, so we first need to extract the `vitamin_c_mg` column. This will return a boolean Series with entries whose vitamin C values are between 20 and 40.

In [210]:
vitC_20_40 = nutrition_non_NA.vitamin_c_mg.between(20, 40)

Let's take a quick look at this Series

In [211]:
vitC_20_40

name
Nuts, pecans                                                                                          False
Teff, uncooked                                                                                        False
Sherbet, orange                                                                                       False
Cauliflower, raw                                                                                      False
Taro leaves, raw                                                                                      False
                                                                                                      ...  
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round    False
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Beef, raw, all grades, 

Now let's use `.loc` to apply it to the `nutrition_non_NA` DataFrame. To this, we can chain on the `nsmallest()` method on on the `calories` column to obtain the food with the lowest calorie count.

In [228]:
nutrition_non_NA.loc[vitC_20_40].calories.nsmallest(1)

name
Asparagus, with salt, drained, boiled, cooked, frozen    18.0
Name: calories, dtype: float64

Thus, salted asparagus is the food with the lowest calorie count among foods with between 20 to 40 mg of vitamin C per 100 g serving.

### Part 3
The strategy will be as follows:
* Calculate the standard deviation value of vitamine C for the DataFrame
* Calculate the z-scores for the vitamin C values
* Use `between()` to obtain a boolean vector of entries whose z-scores are between 2 and 3
* Apply that boolean vector to the original DataFrame
* Count the number of entries using the `size` attribute

Calculate the mean vitamin C value

In [247]:
mean_vitC = mean_vitC = nutrition_non_NA.vitamin_c_mg.mean()
mean_vitC

5.553368523406037

Calculate the standard deviation for vitamin C

In [248]:
std_vitC = nutrition_non_NA.vitamin_c_mg.std()
std_vitC

46.10438522239213

Calculate the z-scores for the vitamin C values

In [249]:
vitC_z_scores = (nutrition_non_NA.vitamin_c_mg - mean_vitC) / std_vitC
vitC_z_scores

name
Nuts, pecans                                                                                         -0.096593
Teff, uncooked                                                                                       -0.120452
Sherbet, orange                                                                                      -0.070565
Cauliflower, raw                                                                                      0.925002
Taro leaves, raw                                                                                      1.007423
                                                                                                        ...   
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round   -0.120452
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand   -0.120452
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand   -0.120

Obtain the boolean Series for of whether the item is between 2 and 3 standard deviations above the mean for vitamin C

In [250]:
two_three_std = vitC_z_scores.between(2,3)
two_three_std

name
Nuts, pecans                                                                                          False
Teff, uncooked                                                                                        False
Sherbet, orange                                                                                       False
Cauliflower, raw                                                                                      False
Taro leaves, raw                                                                                      False
                                                                                                      ...  
Beef, raw, all grades, trimmed to 0" fat, separable lean and fat, boneless, top round roast, round    False
Lamb, cooked, separable lean only, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Lamb, raw, separable lean and fat, composite of trimmed retail cuts, frozen, imported, New Zealand    False
Beef, raw, all grades, 

Apply the boolean vector to the original `nutrition` DataFrame

In [251]:
nutrition_non_NA.loc[two_three_std]

Unnamed: 0_level_0,serving_size_g,calories,total_fat_g,saturated_fat_g,cholesterol_mg,sodium_mg,choline_mg,folate_mcg,folic_acid_mcg,niacin_mg,pantothenic_acid_mg,riboflavin_mg,thiamin_mg,vitamin_a_IU,vitamin_a_rae_mcg,carotene_alpha_mcg,carotene_beta_mcg,cryptoxanthin_beta_mcg,lutein_zeaxanthin_mcg,lucopene,vitamin_b12_mcg,vitamin_b6_mg,vitamin_c_mg,vitamin_d_IU,vitamin_e_mg,tocopherol_alpha_mg,vitamin_k_mcg,calcium_mg,copper_mg,irom_mg,magnesium_mg,manganese_mg,phosphorous_mg,potassium_mg,selenium_mcg,zink_mg,protein_g,alanine_g,arginine_g,aspartic_acid_g,cystine_g,glutamic_acid_g,glycine_g,histidine_g,hydroxyproline,isoleucine_g,leucine_g,lysine_g,methionine_g,phenylalanine_g,proline_g,serine_g,threonine_g,tryptophan_g,tyrosine_g,valine_g,carbohydrate_g,fiber_g,sugars_g,fructose,galactose,glucose,lactose,maltose,sucrose,fat_g,saturated_fatty_acids_g,monounsaturated_fatty_acids_g,polyunsaturated_fatty_acids_g,fatty_acids_total_trans_mg,alcohol_g,ash_g,caffeine_mg,theobromine_mg,water_g
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
"Peppers, raw, jalapeno",100.0,29.0,0.4,0.1,0.0,3.0,7.5,27.0,0.0,1.28,0.315,0.07,0.04,1078.0,54.0,67.0,561.0,105.0,861.0,0.0,0.0,0.419,118.6,0.0,3.58,3.58,18.5,12.0,0.046,0.25,15.0,0.097,26.0,248.0,0.4,0.14,0.91,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.5,2.8,4.12,2.63,0.0,1.48,0.0,0.0,0.0,0.37,0.092,0.029,0.112,0.0,0.0,0.53,0.0,0.0,91.69
"Kale, raw, scotch",100.0,42.0,0.6,0.1,0.0,70.0,0.0,28.0,0.0,1.3,0.076,0.06,0.07,3100.0,155.0,0.0,0.0,0.0,0.0,0.0,0.0,0.227,130.0,0.0,0.0,0.0,0.0,205.0,0.243,3.0,88.0,0.648,62.0,450.0,0.9,0.37,2.8,0.141,0.156,0.25,0.037,0.318,0.135,0.059,0.0,0.168,0.196,0.168,0.027,0.143,0.166,0.118,0.125,0.034,0.099,0.153,8.32,1.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.078,0.045,0.289,0.0,0.0,1.28,0.0,0.0,87.0
"Parsley, fresh",100.0,36.0,0.8,0.1,0.0,56.0,12.8,152.0,0.0,1.313,0.4,0.098,0.086,8424.0,421.0,0.0,5054.0,0.0,5561.0,0.0,0.0,0.09,133.0,0.0,0.75,0.75,1640.0,138.0,0.149,6.2,50.0,0.16,58.0,554.0,0.1,1.07,2.97,0.195,0.122,0.294,0.014,0.249,0.145,0.061,0.0,0.118,0.204,0.181,0.042,0.145,0.213,0.136,0.122,0.045,0.082,0.172,6.33,3.3,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.79,0.132,0.295,0.124,0.0,0.0,2.2,0.0,0.0,87.71
Tomato powder,100.0,302.0,0.4,0.1,0.0,134.0,0.0,120.0,0.0,9.133,3.76,0.761,0.913,17247.0,862.0,0.0,10348.0,0.0,1370.0,0.0,0.0,0.457,116.7,0.0,12.25,12.25,48.8,166.0,1.241,4.56,178.0,1.951,295.0,1927.0,5.3,1.71,12.91,0.405,0.258,1.617,0.076,5.163,0.213,0.204,0.0,0.25,0.359,0.37,0.066,0.273,0.283,0.308,0.295,0.089,0.173,0.264,74.68,16.5,43.9,0.0,0.0,0.0,0.0,0.0,0.0,0.44,0.062,0.066,0.179,0.0,0.0,8.91,0.0,0.0,3.06
"Kale, raw",100.0,49.0,0.9,0.1,0.0,38.0,0.8,141.0,0.0,1.0,0.091,0.13,0.11,9990.0,500.0,54.0,5927.0,81.0,8198.0,0.0,0.0,0.271,120.0,0.0,1.54,1.54,704.8,150.0,1.499,1.47,47.0,0.659,92.0,491.0,0.9,0.56,4.28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.75,3.6,2.26,0.0,0.0,0.0,0.0,0.0,0.0,0.93,0.091,0.052,0.338,0.0,0.0,2.01,0.0,0.0,84.04
"Snacks, rolls, fruit leather",100.0,371.0,3.0,0.7,0.0,317.0,13.2,2.0,0.0,0.1,0.029,0.02,0.084,57.0,6.0,1.0,34.0,0.0,41.0,0.0,0.0,0.3,120.0,0.0,0.56,0.56,18.2,32.0,0.171,1.01,20.0,0.184,31.0,294.0,0.4,0.19,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,85.8,0.0,49.16,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.672,1.48,0.552,0.0,0.0,0.9,0.0,0.0,10.2
"Spices, dried, parsley",100.0,292.0,5.5,1.4,0.0,452.0,97.1,180.0,0.0,9.943,1.062,2.383,0.196,1939.0,97.0,17.0,1152.0,4.0,2428.0,0.0,0.0,0.9,125.0,0.0,8.96,8.96,1359.5,1140.0,0.78,22.04,400.0,9.81,436.0,2683.0,14.1,5.44,26.63,1.778,1.756,3.169,0.298,3.688,1.756,0.718,0.0,1.546,2.794,2.098,0.596,1.712,2.01,1.159,1.193,0.475,1.159,2.021,50.64,26.7,7.27,0.42,0.0,2.76,0.0,0.0,4.09,5.48,1.378,0.761,3.124,0.0,0.0,11.36,0.0,0.0,5.89
"Tomatoes, drained, packed in oil, sun-dried",100.0,213.0,14.0,1.9,0.0,266.0,0.0,23.0,0.0,3.63,0.479,0.383,0.193,1286.0,64.0,0.0,0.0,0.0,0.0,0.0,0.0,0.319,101.8,0.0,0.0,0.0,0.0,47.0,0.473,2.68,81.0,0.466,139.0,1565.0,3.0,0.78,5.06,0.144,0.123,0.702,0.066,1.865,0.125,0.077,0.0,0.121,0.185,0.186,0.044,0.131,0.096,0.134,0.128,0.037,0.087,0.13,23.33,5.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.08,1.893,8.663,2.06,0.0,0.0,3.7,0.0,0.0,53.83
"Snacks, with vitamin C, pieces, fruit leather",100.0,373.0,3.5,1.0,0.0,317.0,0.0,14.0,0.0,0.1,0.0,0.1,0.043,116.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,120.0,0.0,0.0,0.0,0.0,18.0,0.171,0.75,14.0,0.0,24.0,164.0,3.2,0.19,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,85.2,3.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5,0.99,1.724,0.093,0.0,0.0,1.0,0.0,0.0,10.2
"Cereals ready-to-eat, HEALTH VALLEY, OAT BRAN FLAKES",100.0,380.0,3.0,1.0,0.0,380.0,22.9,200.0,181.0,10.0,0.896,0.85,0.75,0.0,0.0,0.0,0.0,0.0,203.0,0.0,3.0,1.0,120.0,0.0,0.53,0.53,1.8,80.0,0.319,2.88,143.0,2.903,343.0,340.0,26.5,2.2,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,78.0,8.0,22.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,0.704,0.897,0.0,0.0,5.53,0.0,0.0,3.4


Determine the number of foods in this filtered DataFrame using `size`

In [252]:
nutrition_non_NA.loc[two_three_std].shape

(17, 75)

There are 28 foods that have vitamin C content between 2 and 3 standard deviations above the mean.

In [262]:
nutrition_non_NA['vitamin_c_mg'].describe()

count    7199.000000
mean        5.553369
std        46.104385
min         0.000000
25%         0.000000
50%         0.000000
75%         1.000000
max      1900.000000
Name: vitamin_c_mg, dtype: float64