# Section 4: Working with DataFrames
Dataframes are the blockbuster data struction in Pandas. Many of the methods and techniques we learned for Series are applicable to Dataframes, but they become even more powerful!
* We'll learn how to clean data and prepare it for analysis
* We'll also learn several Dataframe manipulation techniques

In [1]:
import pandas as pd
import numpy as np

## What is a DataFrame?
A **Dataframe** is quite simply a table of data that contains a collection of rows and columns. They are generally two-dimensional (as opposed to Series which are one-dimensional), with labeled indices and columns. However, by using multiple indices, they are also able to accommodate multi-dimensional data. More on that later.

What this practically means is that we need to specify more than one piece of information in order to identify a specific datapoint in the Dataframe. This contrasts with Series, in which only one piece of information is needed (usually the index label or index position).

The `.ndim` attribute gives us the number of dimensions for a Series or Dataframe. Series have 1 dimension and Dataframes have 2 dimensions
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ndim.html
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ndim.html

Recall the `.shape` attribute from Dataframe, which returns a tuple telling us the dimensionality of the dataframe, that is, the number of rows and columns.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html

Both Series and Dataframes are a collection of values with associated labels. But whereas Series have labeled indices only, Dataframes have labeled indices and columns

Each column in a Dataframe is actually Series. That is, the Series object is the data structure that comprises the column of a Dataframe.

Unlike Series, Dataframes can be *heterogenous*. That is, each column of a DataFrame can be a completely different data type. 
* DataFrames themselves DO NOT have a datatype, since they are a collection of different Series. Thus, the `dtype` attribute on a Dataframe will return an attribute error.
* Instead, the `dtypes` attribute may be used on a Dataframe, which will return a Series with the names of the columns as the index labels and the data type as the values.
 * https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html

## Creating a Dataframe

Just like Series, Dataframes can be created manually. The typical way to do this is to construct a list of each of your columns, and then combine them into a Dataframe.

In [2]:
names = ['Olga','Andrew','Brian','Telulah','Nicole','Tilda']
age = [29, 21, 45, 23, 39, 46]
married = [False, True, True, True, False, True]

After creating the lists, we can build a Dataframe using the `pd.DataFrame()` function. We pass in a dictionary whose keys are the names of the columns, and the values are the names of the lists we created.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

In [3]:
pd.DataFrame({'name': names, 'age': age, 'married': married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Keep in mind that the lists that you construct your dataframe out of MUST be of equal length. If they are not, you will encounter a `ValueError`


In [4]:
# names2 = ['Olga','Andrew','Brian','Telulah','Nicole','Tilda', 'Ryan']
# pd.DataFrame({'name': names2, 'age': age, 'married': married})

## BONUS - Four More Ways to Build DataFrames

There are other ways for creating DataFrames as well. The first is by building from a **dictionary of tuples** (instead of a dictionary of lists)

In [5]:
tuple_names = tuple(names)
tuple_ages = tuple(age)
tuple_married = tuple(married)

In [6]:
pd.DataFrame({'name': tuple_names, 'age': tuple_ages, 'married': tuple_married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


The second of the alternative methods is to pass in a **dictionary of Pandas Series**. All we really need to do is build a set of Series, and combine them using the `pd.DataFrame` method

In [7]:
series_names = pd.Series(data = names)
series_ages = pd.Series(data = age)
series_married = pd.Series(data = married)

In [8]:
pd.DataFrame({'name': series_names, 'age': series_ages, 'married': series_married})

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Slightly more challenging is to use a **dictionary of dictionaries**. The strategy:
1. Replicate the upper left section of the dataframe in order to understand what shape of data you are targeting
2. Then, extend that same structure to the rest of the data programmatically so that we don't have to type it all out

In [9]:
pd.DataFrame({'names' : {0: 'Olga', 1: 'Andrew'}})

Unnamed: 0,names
0,Olga
1,Andrew


From this starter, we use the built-in Python function `enumerate()` to traverse the list and access each item in turn.

In [10]:
list(enumerate(names))

[(0, 'Olga'),
 (1, 'Andrew'),
 (2, 'Brian'),
 (3, 'Telulah'),
 (4, 'Nicole'),
 (5, 'Tilda')]

We'll use dictionary comprehension to create key:value pairs for each of these enumerated names.

In [11]:
dict_names = {key:value for key, value in enumerate(names)}
dict_names

{0: 'Olga', 1: 'Andrew', 2: 'Brian', 3: 'Telulah', 4: 'Nicole', 5: 'Tilda'}

Let's take care of the rest of the columns, converting them all to dictionaries of keys and values

In [12]:
dict_ages = {key:value for key, value in enumerate(age)}
print(dict_ages)
dict_married = {key:value for key, value in enumerate(married)}
print(dict_married)

{0: 29, 1: 21, 2: 45, 3: 23, 4: 39, 5: 46}
{0: False, 1: True, 2: True, 3: True, 4: False, 5: True}


Note that we could have done this using a function, which helps streamline things a big. If you needed to change how the enumeration is carried out, you need only modify your one function instead of modifying each conversion individually:

In [13]:
def convert_list_to_dict(l):
  return {key:value for key, value in enumerate(l)}

Now all we have to do is construct the DataFrame!

In [14]:
pd.DataFrame({'name': dict_names,
              'ages': dict_ages,
              'married': dict_married})

Unnamed: 0,name,ages,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


Finally, we can take a row-centric approach that builds a DataFrame row by row (as opposed to the column-centric approaches above that build DataFrames column-by-column) by using a **list of dictionaries**

Let's start by replicating a single row:

In [15]:
pd.DataFrame([{'name':'Olga', 'age':29, 'married': False}])

Unnamed: 0,name,age,married
0,Olga,29,False


Next we use the `zip()` method to combine multiple iterables (in this case `names`, `age`, and `married`, into a list of tuples

In [16]:
list(zip(names, age, married))

[('Olga', 29, False),
 ('Andrew', 21, True),
 ('Brian', 45, True),
 ('Telulah', 23, True),
 ('Nicole', 39, False),
 ('Tilda', 46, True)]

Now we have to build a list of dictionaries. Each row will be its own dictionary, and we can do this using `zip()` and list comprehension

In [17]:
rowwise = [{'name': name, 'age': age, 'married':married} for name, age, married in zip(names, age, married)]
rowwise

[{'age': 29, 'married': False, 'name': 'Olga'},
 {'age': 21, 'married': True, 'name': 'Andrew'},
 {'age': 45, 'married': True, 'name': 'Brian'},
 {'age': 23, 'married': True, 'name': 'Telulah'},
 {'age': 39, 'married': False, 'name': 'Nicole'},
 {'age': 46, 'married': True, 'name': 'Tilda'}]

Finally, we combine them into a DataFrame!

In [18]:
pd.DataFrame(rowwise)

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


There are additional methods for building DataFrames, but we won't be covering them here. Again, in practice you will primarily be reading data into Pandas instead of building it from within.

## The `info()` method


The `info()` method gives you a brief summary of what a DataFrame contains. it is one of the first things you should do when getting ready to work with a new DataFrame.

`info()` prints information including the index dtype, the column names counts of non-null values, and memory usage.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

In [19]:
df = pd.DataFrame({'name': names, 'age': age, 'married': married})
df

Unnamed: 0,name,age,married
0,Olga,29,False
1,Andrew,21,True
2,Brian,45,True
3,Telulah,23,True
4,Nicole,39,False
5,Tilda,46,True


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes


The above example was a small DataFrame, but for large DataFrames even the `info()` summary may get very long and overwhelming. We can set the `verbose` parameter to *False* and  the individual column names will not be printed. Additionally, the different dtypes will be summarized instead of listed for each column.

In [21]:
df.info(verbose = False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Columns: 3 entries, name to married
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes


We can also set the `maxcols` parameter to limit the number of columns that will be described by the verbose setting. If the number of columns in the DataFrame exceeds this parameter, it will convert to a non-verbose summary.

In [22]:
print(df.info(max_cols=4))
print('')
print(df.info(max_cols=2))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes
None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Columns: 3 entries, name to married
dtypes: bool(1), int64(1), object(1)
memory usage: 230.0+ bytes
None


Perhaps the most useful parameter from `info()` is `memory_usage`, where it will estimate the DataFrame's size in memory. If we set `memory_usage` to *deep*, the method will in calculate real memory usage with depp introspection. Otherwise, teh estimation will be made based on column dtype and number of rows, asssuming values consume the same memory amount for corresponding dtypes.

Using *deep* `memory_usage` is slower and requires more computation.

In [23]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     6 non-null      object
 1   age      6 non-null      int64 
 2   married  6 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 557.0 bytes


## Reading in Nutritional Data

The remainder of this section will rely on a sizeable dataset for roughyl 9000 food items and their nutritional information.

dataurl = https://andybek.com/pandas-nutrition

Import the data

In [24]:
dataurl = 'https://andybek.com/pandas-nutrition'
nutrition = pd.read_csv(dataurl)
nutrition.head(10)

Unnamed: 0.1,Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
5,5,"Cauliflower, raw",100 g,25,0.3g,0.1g,0,30.00 mg,44.3 mg,57.00 mcg,0.00 mcg,0.507 mg,0.667 mg,0.060 mg,0.050 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,1.00 mcg,0,0.00 mcg,0.184 mg,48.2 mg,0.00 IU,0.08 mg,0.08 mg,15.5 mcg,22.00 mg,0.039 mg,0.42 mg,15.00 mg,0.155 mg,44.00 mg,299.00 mg,0.6 mcg,0.27 mg,1.92 g,0.116 g,0.086 g,0.177 g,0.020 g,0.257 g,0.071 g,0.056 g,0,0.071 g,0.106 g,0.217 g,0.020 g,0.065 g,0.071 g,0.086 g,0.076 g,0.020 g,0.051 g,0.125 g,4.97 g,2.0 g,1.91 g,0.97 g,0.00 g,0.94 g,0.00 g,0.00 g,0.00 g,0.28 g,0.130 g,0.034 g,0.031 g,0.00 mg,0.0 g,0.76 g,0.00 mg,0.00 mg,92.07 g
6,6,"Taro leaves, raw",100 g,42,0.7g,0.2g,0,3.00 mg,12.8 mg,126.00 mcg,0.00 mcg,1.513 mg,0.084 mg,0.456 mg,0.209 mg,4825.00 IU,241.00 mcg,0.00 mcg,2895.00 mcg,0.00 mcg,1932.00 mcg,0,0.00 mcg,0.146 mg,52.0 mg,0.00 IU,2.02 mg,2.02 mg,108.6 mcg,107.00 mg,0.270 mg,2.25 mg,45.00 mg,0.714 mg,60.00 mg,648.00 mg,0.9 mcg,0.41 mg,4.98 g,0,0.220 g,0,0.064 g,0,0,0.114 g,0,0.260 g,0.392 g,0.246 g,0.079 g,0.195 g,0,0,0.167 g,0.048 g,0.178 g,0.256 g,6.70 g,3.7 g,3.01 g,0,0,0,0,0,0,0.74 g,0.151 g,0.060 g,0.307 g,0.00 mg,0.0 g,1.92 g,0.00 mg,0.00 mg,85.66 g
7,7,"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g
8,8,"Cheese, camembert",100 g,300,24g,15g,72mg,842.00 mg,15.4 mg,62.00 mcg,0.00 mcg,0.630 mg,1.364 mg,0.488 mg,0.028 mg,820.00 IU,241.00 mcg,0.00 mcg,12.00 mcg,0.00 mcg,0.00 mcg,0,1.30 mcg,0.227 mg,0.0 mg,18.00 IU,0.21 mg,0.21 mg,2.0 mcg,388.00 mg,0.021 mg,0.33 mg,20.00 mg,0.038 mg,347.00 mg,187.00 mg,14.5 mcg,2.38 mg,19.80 g,0.819 g,0.701 g,1.288 g,0.109 g,4.187 g,0.379 g,0.683 g,0,0.968 g,1.840 g,1.766 g,0.565 g,1.105 g,2.346 g,1.114 g,0.717 g,0.307 g,1.145 g,1.279 g,0.46 g,0.0 g,0.46 g,0,0,0,0,0,0,24.26 g,15.259 g,7.023 g,0.724 g,72.00 mg,0.0 g,3.68 g,0.00 mg,0.00 mg,51.80 g
9,9,Vegetarian fillets,100 g,290,18g,2.8g,0,490.00 mg,82.0 mg,102.00 mcg,0.00 mcg,12.000 mg,0,0.900 mg,1.100 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.20 mcg,1.500 mg,0.0 mg,0.00 IU,3.45 mg,3.45 mg,0.0 mcg,95.00 mg,0.925 mg,2.00 mg,23.00 mg,0,450.00 mg,600.00 mg,1.0 mcg,1.40 mg,23.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9.00 g,6.1 g,0.80 g,0,0,0,0,0,0,18.00 g,2.849 g,4.376 g,9.332 g,0.00 mg,0.0 g,5.00 g,0.00 mg,0.00 mg,45.00 g


This is a fairly large DataFrame. The info method shows that it has 77 columns and 8789 entries. It takes up about 39.2 MB of memory.

Note that one of the dtypes is called `object`. This has to do with the fact that for many of the columns,the units are embedded with the values. Therefore, Pandas cannot treat them as numericals. Instead, they are automatically converted into strings, and Pandas treats strings as `object` datatypes.

In [25]:
nutrition.info(verbose = False, memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8789 entries, 0 to 8788
Columns: 77 entries, Unnamed: 0 to water
dtypes: int64(3), object(74)
memory usage: 39.2 MB


## Cleanup: Removing Duplicated index
Let's take our first crack at cleaning data! If you look at your DataFrame, there is an extra column called `Unnamed: 0`, which appears to be a repitition of the index labels. This is NOT an index column, but rather just another column.

In [26]:
nutrition['Unnamed: 0']

0          0
1          1
2          2
3          3
4          4
        ... 
8784    8784
8785    8785
8786    8786
8787    8787
8788    8788
Name: Unnamed: 0, Length: 8789, dtype: int64

We should remove this column as it is not useful. One option is to use the `drop()` method. All one must do is specify the name of the entity to be dropped as well as the `axis`. Remember that DataFrames are two-dimensional, and so the axis on which a property is to be dropped must be specified. After all, the name could refer to either an index label or a column name, and you have to tell Pandas which one it is.

By default, this will return a new DataFrame with the indicated column(s) dropped.

In [27]:
nutrition.drop('Unnamed: 0', axis = 1)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8784,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,125,3.5g,1.4g,62mg,54.00 mg,64.5 mg,4.00 mcg,0.00 mcg,6.422 mg,0.356 mg,0.234 mg,0.063 mg,11.00 IU,3.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.631 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.33 mg,12.00 mg,0.004 mg,219.00 mg,311.00 mg,22.1 mcg,3.67 mg,23.45 g,1.454 g,1.597 g,2.285 g,0.239 g,3.834 g,1.154 g,0.879 g,0.160 g,1.092 g,2.021 g,2.246 g,0.635 g,0.941 g,1.052 g,0.966 g,1.105 g,0.262 g,0.874 g,1.172 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.50 g,1.353 g,1.554 g,0.244 g,62.00 mg,0.0 g,1.11 g,0.00 mg,0.00 mg,72.51 g
8785,"Lamb, cooked, separable lean only, composite o...",100 g,206,8.9g,3.9g,109mg,50.00 mg,0,0.00 mcg,0.00 mcg,7.680 mg,0.580 mg,0.500 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.95 mcg,0.140 mg,0.0 mg,0,0.19 mg,0.19 mg,0,13.00 mg,0.114 mg,2.35 mg,22.00 mg,0.029 mg,246.00 mg,188.00 mg,2.0 mcg,4.30 mg,29.59 g,1.780 g,1.758 g,2.605 g,0.353 g,4.294 g,1.445 g,0.937 g,0,1.428 g,2.302 g,2.613 g,0.759 g,1.205 g,1.241 g,1.100 g,1.267 g,0.346 g,0.995 g,1.597 g,0.00 g,0.0 g,0,0,0,0,0,0,0,8.86 g,3.860 g,3.480 g,0.520 g,109.00 mg,0,1.60 g,0,0,59.95 g
8786,"Lamb, raw, separable lean and fat, composite o...",100 g,277,23g,12g,78mg,39.00 mg,0,1.00 mcg,0.00 mcg,6.550 mg,0.520 mg,0.320 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.42 mcg,0.110 mg,0.0 mg,0,0.21 mg,0.21 mg,0,13.00 mg,0.083 mg,1.49 mg,15.00 mg,0.018 mg,168.00 mg,136.00 mg,1.3 mcg,2.39 mg,16.74 g,1.007 g,0.994 g,1.473 g,0.200 g,2.429 g,0.818 g,0.530 g,0,0.808 g,1.302 g,1.478 g,0.430 g,0.681 g,0.702 g,0.622 g,0.716 g,0.196 g,0.563 g,0.903 g,0.00 g,0.0 g,0,0,0,0,0,0,0,22.74 g,11.570 g,8.720 g,0.980 g,78.00 mg,0,0.92 g,0,0,59.80 g
8787,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,121,3g,1.1g,60mg,53.00 mg,64.2 mg,4.00 mcg,0.00 mcg,6.720 mg,0.355 mg,0.184 mg,0.063 mg,4.00 IU,1.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.84 mcg,0.644 mg,0.0 mg,1.00 IU,0.24 mg,0.24 mg,1.5 mcg,13.00 mg,0.042 mg,1.45 mg,12.00 mg,0.001 mg,222.00 mg,319.00 mg,22.6 mcg,3.42 mg,23.37 g,1.525 g,1.714 g,2.468 g,0.256 g,4.167 g,1.101 g,0.948 g,0.118 g,1.192 g,2.198 g,2.457 g,0.679 g,1.018 g,1.073 g,1.037 g,1.201 g,0.287 g,0.954 g,1.259 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.04 g,1.086 g,1.266 g,0.233 g,60.00 mg,0.0 g,1.10 g,0.00 mg,0.00 mg,73.43 g


Another option is to designated the unwanted column as the index using the `set_index` method. That works for this example because the actual index and the unwanted column are the same. In practice, you will want to ensure that the column you set as the index is truely appropriate as the index.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

In any case, the column indicated will replace the integer index column.

In [28]:
nutrition.set_index('Unnamed: 0')

Unnamed: 0_level_0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1
0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8784,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,125,3.5g,1.4g,62mg,54.00 mg,64.5 mg,4.00 mcg,0.00 mcg,6.422 mg,0.356 mg,0.234 mg,0.063 mg,11.00 IU,3.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.631 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.33 mg,12.00 mg,0.004 mg,219.00 mg,311.00 mg,22.1 mcg,3.67 mg,23.45 g,1.454 g,1.597 g,2.285 g,0.239 g,3.834 g,1.154 g,0.879 g,0.160 g,1.092 g,2.021 g,2.246 g,0.635 g,0.941 g,1.052 g,0.966 g,1.105 g,0.262 g,0.874 g,1.172 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.50 g,1.353 g,1.554 g,0.244 g,62.00 mg,0.0 g,1.11 g,0.00 mg,0.00 mg,72.51 g
8785,"Lamb, cooked, separable lean only, composite o...",100 g,206,8.9g,3.9g,109mg,50.00 mg,0,0.00 mcg,0.00 mcg,7.680 mg,0.580 mg,0.500 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.95 mcg,0.140 mg,0.0 mg,0,0.19 mg,0.19 mg,0,13.00 mg,0.114 mg,2.35 mg,22.00 mg,0.029 mg,246.00 mg,188.00 mg,2.0 mcg,4.30 mg,29.59 g,1.780 g,1.758 g,2.605 g,0.353 g,4.294 g,1.445 g,0.937 g,0,1.428 g,2.302 g,2.613 g,0.759 g,1.205 g,1.241 g,1.100 g,1.267 g,0.346 g,0.995 g,1.597 g,0.00 g,0.0 g,0,0,0,0,0,0,0,8.86 g,3.860 g,3.480 g,0.520 g,109.00 mg,0,1.60 g,0,0,59.95 g
8786,"Lamb, raw, separable lean and fat, composite o...",100 g,277,23g,12g,78mg,39.00 mg,0,1.00 mcg,0.00 mcg,6.550 mg,0.520 mg,0.320 mg,0.130 mg,0.00 IU,0.00 mcg,0,0,0,0,0,2.42 mcg,0.110 mg,0.0 mg,0,0.21 mg,0.21 mg,0,13.00 mg,0.083 mg,1.49 mg,15.00 mg,0.018 mg,168.00 mg,136.00 mg,1.3 mcg,2.39 mg,16.74 g,1.007 g,0.994 g,1.473 g,0.200 g,2.429 g,0.818 g,0.530 g,0,0.808 g,1.302 g,1.478 g,0.430 g,0.681 g,0.702 g,0.622 g,0.716 g,0.196 g,0.563 g,0.903 g,0.00 g,0.0 g,0,0,0,0,0,0,0,22.74 g,11.570 g,8.720 g,0.980 g,78.00 mg,0,0.92 g,0,0,59.80 g
8787,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,121,3g,1.1g,60mg,53.00 mg,64.2 mg,4.00 mcg,0.00 mcg,6.720 mg,0.355 mg,0.184 mg,0.063 mg,4.00 IU,1.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.84 mcg,0.644 mg,0.0 mg,1.00 IU,0.24 mg,0.24 mg,1.5 mcg,13.00 mg,0.042 mg,1.45 mg,12.00 mg,0.001 mg,222.00 mg,319.00 mg,22.6 mcg,3.42 mg,23.37 g,1.525 g,1.714 g,2.468 g,0.256 g,4.167 g,1.101 g,0.948 g,0.118 g,1.192 g,2.198 g,2.457 g,0.679 g,1.018 g,1.073 g,1.037 g,1.201 g,0.287 g,0.954 g,1.259 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,3.04 g,1.086 g,1.266 g,0.233 g,60.00 mg,0.0 g,1.10 g,0.00 mg,0.00 mg,73.43 g


A third approach, advocated by the instructor, is to recognize that the `Unnamed: 0` column was almost certainly an index for this dataset, and that we failed to notice that when reading in the data. If we do recognize it, we can let Pandas know in the `read_csv()` call that this column (or any column for that matter) is the index column by using the `index_col` parameter.

In this example, specifiying `index_col=0` tells Pandas that the first column (at index 0) is the index column. This column will be set as the index. If we do not specify this, then Pandas will create a default index column with integers.

In [29]:
nutrition = pd.read_csv(dataurl, index_col='Unnamed: 0')
nutrition.head(10)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,0.00 mcg,0.000 mg,0.000 mg,0.000 mg,0.000 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.00 mcg,0.000 mg,0.0 mg,0.00 IU,0.00 mg,0.00 mg,0.0 mcg,2.00 mg,0.050 mg,0.47 mg,3.00 mg,0.053 mg,13.00 mg,3.00 mg,2.8 mcg,0.06 mg,0.26 g,0.019 g,0.012 g,0.020 g,0.006 g,0.053 g,0.009 g,0.008 g,0,0.010 g,0.036 g,0.006 g,0.006 g,0.013 g,0.024 g,0.012 g,0.009 g,0.001 g,0.010 g,0.014 g,91.27 g,0.9 g,0.00 g,0,0,0,0,0,0,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,0.00 mcg,1.167 mg,0.863 mg,0.130 mg,0.660 mg,56.00 IU,3.00 mcg,0.00 mcg,29.00 mcg,9.00 mcg,17.00 mcg,0,0.00 mcg,0.210 mg,1.1 mg,0.00 IU,1.40 mg,1.40 mg,3.5 mcg,70.00 mg,1.200 mg,2.53 mg,121.00 mg,4.500 mg,277.00 mg,410.00 mg,3.8 mcg,4.53 mg,9.17 g,0.397 g,1.177 g,0.929 g,0.152 g,1.829 g,0.453 g,0.262 g,0,0.336 g,0.598 g,0.287 g,0.183 g,0.426 g,0.363 g,0.474 g,0.306 g,0.093 g,0.215 g,0.411 g,13.86 g,9.6 g,3.97 g,0.04 g,0,0.04 g,0.00 g,0.00 g,3.90 g,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,0.00 mcg,0.649 mg,0.281 mg,0.037 mg,0.039 mg,23.00 IU,1.00 mcg,0.00 mcg,14.00 mcg,0.00 mcg,36.00 mcg,0,0.00 mcg,0.084 mg,2.2 mg,0.00 IU,0.30 mg,0.30 mg,3.5 mcg,9.00 mg,0.081 mg,0.23 mg,14.00 mg,0.232 mg,24.00 mg,229.00 mg,0.3 mcg,0.16 mg,0.98 g,0.051 g,0.057 g,0.164 g,0.006 g,0.186 g,0.041 g,0.023 g,0,0.045 g,0.064 g,0.047 g,0.011 g,0.043 g,0.043 g,0.042 g,0.037 g,0.009 g,0.027 g,0.053 g,5.88 g,3.0 g,3.53 g,1.54 g,0,1.58 g,0,0,0.26 g,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,0,3.363 mg,0.942 mg,0.270 mg,0.390 mg,9.00 IU,0.00 mcg,0.00 mcg,5.00 mcg,0.00 mcg,66.00 mcg,0,0,0.482 mg,0,0,0.08 mg,0.08 mg,1.9 mcg,180.00 mg,0.810 mg,7.63 mg,184.00 mg,9.240 mg,429.00 mg,427.00 mg,4.4 mcg,3.63 mg,13.30 g,0.747 g,0.517 g,0.820 g,0.236 g,3.349 g,0.477 g,0.301 g,0,0.501 g,1.068 g,0.376 g,0.428 g,0.698 g,0.664 g,0.622 g,0.510 g,0.139 g,0.458 g,0.686 g,73.13 g,8.0 g,1.84 g,0.47 g,0.00 g,0.73 g,0.00 g,0.01 g,0.62 g,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,0.00 mcg,0.063 mg,0.224 mg,0.097 mg,0.027 mg,46.00 IU,12.00 mcg,0.00 mcg,1.00 mcg,5.00 mcg,7.00 mcg,0,0.13 mcg,0.023 mg,2.3 mg,0.00 IU,0.01 mg,0.01 mg,0.0 mcg,54.00 mg,0.028 mg,0.14 mg,8.00 mg,0.011 mg,40.00 mg,96.00 mg,1.5 mcg,0.48 mg,1.10 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.40 g,1.3 g,24.32 g,0,0,0,0,0,0,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g
5,"Cauliflower, raw",100 g,25,0.3g,0.1g,0,30.00 mg,44.3 mg,57.00 mcg,0.00 mcg,0.507 mg,0.667 mg,0.060 mg,0.050 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,1.00 mcg,0,0.00 mcg,0.184 mg,48.2 mg,0.00 IU,0.08 mg,0.08 mg,15.5 mcg,22.00 mg,0.039 mg,0.42 mg,15.00 mg,0.155 mg,44.00 mg,299.00 mg,0.6 mcg,0.27 mg,1.92 g,0.116 g,0.086 g,0.177 g,0.020 g,0.257 g,0.071 g,0.056 g,0,0.071 g,0.106 g,0.217 g,0.020 g,0.065 g,0.071 g,0.086 g,0.076 g,0.020 g,0.051 g,0.125 g,4.97 g,2.0 g,1.91 g,0.97 g,0.00 g,0.94 g,0.00 g,0.00 g,0.00 g,0.28 g,0.130 g,0.034 g,0.031 g,0.00 mg,0.0 g,0.76 g,0.00 mg,0.00 mg,92.07 g
6,"Taro leaves, raw",100 g,42,0.7g,0.2g,0,3.00 mg,12.8 mg,126.00 mcg,0.00 mcg,1.513 mg,0.084 mg,0.456 mg,0.209 mg,4825.00 IU,241.00 mcg,0.00 mcg,2895.00 mcg,0.00 mcg,1932.00 mcg,0,0.00 mcg,0.146 mg,52.0 mg,0.00 IU,2.02 mg,2.02 mg,108.6 mcg,107.00 mg,0.270 mg,2.25 mg,45.00 mg,0.714 mg,60.00 mg,648.00 mg,0.9 mcg,0.41 mg,4.98 g,0,0.220 g,0,0.064 g,0,0,0.114 g,0,0.260 g,0.392 g,0.246 g,0.079 g,0.195 g,0,0,0.167 g,0.048 g,0.178 g,0.256 g,6.70 g,3.7 g,3.01 g,0,0,0,0,0,0,0.74 g,0.151 g,0.060 g,0.307 g,0.00 mg,0.0 g,1.92 g,0.00 mg,0.00 mg,85.66 g
7,"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g
8,"Cheese, camembert",100 g,300,24g,15g,72mg,842.00 mg,15.4 mg,62.00 mcg,0.00 mcg,0.630 mg,1.364 mg,0.488 mg,0.028 mg,820.00 IU,241.00 mcg,0.00 mcg,12.00 mcg,0.00 mcg,0.00 mcg,0,1.30 mcg,0.227 mg,0.0 mg,18.00 IU,0.21 mg,0.21 mg,2.0 mcg,388.00 mg,0.021 mg,0.33 mg,20.00 mg,0.038 mg,347.00 mg,187.00 mg,14.5 mcg,2.38 mg,19.80 g,0.819 g,0.701 g,1.288 g,0.109 g,4.187 g,0.379 g,0.683 g,0,0.968 g,1.840 g,1.766 g,0.565 g,1.105 g,2.346 g,1.114 g,0.717 g,0.307 g,1.145 g,1.279 g,0.46 g,0.0 g,0.46 g,0,0,0,0,0,0,24.26 g,15.259 g,7.023 g,0.724 g,72.00 mg,0.0 g,3.68 g,0.00 mg,0.00 mg,51.80 g
9,Vegetarian fillets,100 g,290,18g,2.8g,0,490.00 mg,82.0 mg,102.00 mcg,0.00 mcg,12.000 mg,0,0.900 mg,1.100 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,4.20 mcg,1.500 mg,0.0 mg,0.00 IU,3.45 mg,3.45 mg,0.0 mcg,95.00 mg,0.925 mg,2.00 mg,23.00 mg,0,450.00 mg,600.00 mg,1.0 mcg,1.40 mg,23.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9.00 g,6.1 g,0.80 g,0,0,0,0,0,0,18.00 g,2.849 g,4.376 g,9.332 g,0.00 mg,0.0 g,5.00 g,0.00 mg,0.00 mg,45.00 g


## The `sample()` Method

`sample()` is called on a DataFrame and returns back a random record/observation from that DataFrame
* You can also use the `random_state` parameter to "choose" which record(s) to return. This provides a seed for a random number generator. For any given random state, it will return the same record(s) each time
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

In [36]:
nutrition.sample()

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7058,"Cereals, prepared with water, with cinnamon an...",100 g,96,1.2g,0.2g,0,111.00 mg,7.2 mg,48.00 mcg,43.00 mcg,2.155 mg,0.220 mg,0.177 mg,0.168 mg,366.00 IU,110.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,31.00 mcg,0,0.00 mcg,0.220 mg,0.0 mg,0.00 IU,0.12 mg,0.12 mg,0.5 mcg,61.00 mg,0.082 mg,2.11 mg,24.00 mg,0.643 mg,80.00 mg,71.00 mg,1.6 mcg,0.57 mg,2.37 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,18.95 g,2.0 g,6.31 g,0,0,0,0,0,0,1.21 g,0.191 g,0.371 g,0.422 g,0.00 mg,0.0 g,0.73 g,0.00 mg,0.00 mg,76.74 g


In [42]:
nutrition.sample(random_state=2)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7652,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100 g,121,2.9g,1.1g,62mg,54.00 mg,64.8 mg,4.00 mcg,0.00 mcg,6.462 mg,0.358 mg,0.236 mg,0.064 mg,6.00 IU,2.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.64 mcg,0.635 mg,0.0 mg,1.00 IU,0.23 mg,0.23 mg,1.5 mcg,13.00 mg,0.048 mg,2.34 mg,12.00 mg,0.004 mg,220.00 mg,313.00 mg,22.2 mcg,3.70 mg,23.59 g,1.540 g,1.730 g,2.491 g,0.259 g,4.206 g,1.111 g,0.957 g,0.119 g,1.203 g,2.219 g,2.480 g,0.685 g,1.028 g,1.083 g,1.047 g,1.212 g,0.289 g,0.963 g,1.271 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,2.94 g,1.120 g,1.292 g,0.223 g,62.00 mg,0.0 g,1.12 g,0.00 mg,0.00 mg,72.96 g


You can use a comparison operator to prove that the same random state will return the same records every time.

In [43]:
nutrition.sample(random_state=2) == nutrition.sample(random_state=2)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7652,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True


You can choose how many records to return using the `n` parameter. It defaults to 1


In [41]:
nutrition.sample(n = 5)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
7972,"Pork, broiled, cooked, boneless, separable lea...",100 g,247,16g,4.4g,88mg,58.00 mg,75.6 mg,0.00 mcg,0.00 mcg,8.699 mg,1.217 mg,0.354 mg,0.544 mg,14.00 IU,4.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.72 mcg,0.547 mg,0.0 mg,46.00 IU,0.24 mg,0.24 mg,0.0 mcg,8.00 mg,0.070 mg,0.89 mg,24.00 mg,0.009 mg,272.00 mg,408.00 mg,44.7 mcg,2.33 mg,26.28 g,1.505 g,1.693 g,2.458 g,0.296 g,4.014 g,1.188 g,1.069 g,0.092 g,1.244 g,2.163 g,2.337 g,0.720 g,1.092 g,1.062 g,1.097 g,1.156 g,0.313 g,1.041 g,1.329 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,15.73 g,4.408 g,5.227 g,1.933 g,88.00 mg,0.0 g,1.05 g,0.00 mg,0.00 mg,57.45 g
6402,"Cereals ready-to-eat, Natural Granola Apple Cr...",100 g,418,11g,1.1g,2mg,47.00 mg,0,31.00 mcg,0,1.980 mg,0,0.260 mg,0.340 mg,19.00 IU,0,0,0,0,0,0,0.14 mcg,0.193 mg,3.1 mg,0,1.63 mg,1.63 mg,0,98.00 mg,0,2.47 mg,106.00 mg,0,334.00 mg,444.00 mg,0,2.44 mg,9.21 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,74.73 g,9.9 g,27.43 g,0,0,0,0,0,0,11.06 g,1.130 g,6.390 g,3.000 g,2.00 mg,0,1.79 g,0,0,3.21 g
816,"Lettuce, raw, cos or romaine",100 g,17,0.3g,,0,8.00 mg,9.9 mg,136.00 mcg,0.00 mcg,0.313 mg,0.142 mg,0.067 mg,0.072 mg,8710.00 IU,436.00 mcg,0.00 mcg,5226.00 mcg,0.00 mcg,2312.00 mcg,0,0.00 mcg,0.074 mg,4.0 mg,0.00 IU,0.13 mg,0.13 mg,102.5 mcg,33.00 mg,0.048 mg,0.97 mg,14.00 mg,0.155 mg,30.00 mg,247.00 mg,0.4 mcg,0.23 mg,1.23 g,0.056 g,0.054 g,0.139 g,0.006 g,0.178 g,0.049 g,0.021 g,0,0.045 g,0.076 g,0.064 g,0.015 g,0.065 g,0.045 g,0.050 g,0.043 g,0.010 g,0.025 g,0.055 g,3.29 g,2.1 g,1.19 g,0.80 g,0.00 g,0.39 g,0.00 g,0.00 g,0.00 g,0.30 g,0.039 g,0.012 g,0.160 g,0.00 mg,0.0 g,0.58 g,0.00 mg,0.00 mg,94.61 g
7831,"Cereals, without salt, stove-top, cooked with ...",100 g,56,0.2g,0.1g,0,34.00 mg,0,29.00 mcg,15.00 mcg,0.803 mg,0.267 mg,0.067 mg,0.093 mg,0,0,0,0,0,0,0,0,0.031 mg,0.0 mg,0,0.04 mg,0.04 mg,0.0 mcg,104.00 mg,0.036 mg,4.09 mg,7.00 mg,0.188 mg,47.00 mg,23.00 mg,0,0.22 mg,1.44 g,0.065 g,0.062 g,0.079 g,0.028 g,0.573 g,0.070 g,0.031 g,0.000 g,0.055 g,0.111 g,0.057 g,0.029 g,0.072 g,0.351 g,0.091 g,0.038 g,0.014 g,0.060 g,0.076 g,11.74 g,0.7 g,0.07 g,0.00 g,0.00 g,0.00 g,0.00 g,0.00 g,0.07 g,0.20 g,0.051 g,0.030 g,0.102 g,0,0,0.42 g,0,0,86.20 g
7608,"Beef, braised, cooked, all grades, trimmed to ...",100 g,238,10g,3.8g,90mg,45.00 mg,130.8 mg,9.00 mcg,0.00 mcg,3.650 mg,0.360 mg,0.240 mg,0.070 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.62 mcg,0.270 mg,0.0 mg,7.00 IU,0.17 mg,0.17 mg,1.7 mcg,5.00 mg,0.118 mg,3.16 mg,24.00 mg,0.017 mg,215.00 mg,319.00 mg,32.5 mcg,4.34 mg,34.34 g,2.071 g,2.170 g,3.137 g,0.385 g,5.159 g,1.874 g,1.176 g,0,1.544 g,2.714 g,2.857 g,0.879 g,1.341 g,1.517 g,1.313 g,1.500 g,0.385 g,1.154 g,1.670 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,10.13 g,3.780 g,4.150 g,0.420 g,90.00 mg,0.0 g,1.66 g,0.00 mg,0.00 mg,54.68 g


You can also use the `frac` parameter to indicate what percentage of the data you want to randomly sample. It cannot be used at the same time as `n` parameter. Think about why that makes sense.

In [51]:
nutrition.sample(frac = 0.01)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
1645,"Fish, dry heat, cooked, swordfish",100 g,172,7.9g,1.9g,78mg,97.00 mg,77.5 mg,2.00 mcg,0.00 mcg,9.254 mg,0.417 mg,0.063 mg,0.089 mg,129.00 IU,43.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.62 mcg,0.615 mg,0.0 mg,666.00 IU,2.41 mg,2.41 mg,0.1 mcg,6.00 mg,0.046 mg,0.45 mg,35.00 mg,0.013 mg,304.00 mg,499.00 mg,68.5 mcg,0.78 mg,23.45 g,1.429 g,1.413 g,2.418 g,0.253 g,3.525 g,1.133 g,0.695 g,0,1.088 g,1.919 g,2.168 g,0.699 g,0.922 g,0.835 g,0.964 g,1.035 g,0.265 g,0.797 g,1.216 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,7.93 g,1.911 g,3.544 g,1.368 g,78.00 mg,0.0 g,1.71 g,0.00 mg,0.00 mg,68.26 g
4520,"Cereals ready-to-eat, TOOTIE FRUITIES, MALT-O-...",100 g,391,3.2g,0.7g,0,453.00 mg,5.8 mg,625.00 mcg,606.00 mcg,15.630 mg,0,1.330 mg,1.160 mg,1563.00 IU,448.00 mcg,24.00 mcg,37.00 mcg,0.00 mcg,625.00 mcg,0,4.69 mcg,1.550 mg,18.8 mg,125.00 IU,0.01 mg,0.01 mg,0.1 mcg,313.00 mg,0.064 mg,28.12 mg,24.00 mg,0,78.00 mg,104.00 mg,9.9 mcg,11.72 mg,4.69 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,85.94 g,2.2 g,45.31 g,0,0,0,0,0,0,3.20 g,0.687 g,1.470 g,0.976 g,0.00 mg,0.0 g,3.67 g,0.00 mg,0.00 mg,2.50 g
244,"Fish, raw, bluefish",100 g,124,4.2g,0.9g,59mg,60.00 mg,0,2.00 mcg,0.00 mcg,5.950 mg,0.828 mg,0.080 mg,0.058 mg,398.00 IU,120.00 mcg,0,0,0,0,0,5.39 mcg,0.402 mg,0.0 mg,0,0,0,0,7.00 mg,0.053 mg,0.48 mg,33.00 mg,0.021 mg,227.00 mg,372.00 mg,36.5 mcg,0.81 mg,20.04 g,1.212 g,1.199 g,2.052 g,0.215 g,2.991 g,0.962 g,0.590 g,0,0.923 g,1.629 g,1.840 g,0.593 g,0.782 g,0.709 g,0.818 g,0.878 g,0.224 g,0.676 g,1.032 g,0.00 g,0.0 g,0,0,0,0,0,0,0,4.24 g,0.915 g,1.793 g,1.060 g,59.00 mg,0.0 g,1.04 g,0,0,70.86 g
6608,"Cheese product, vitamin D fortified, American,...",100 g,312,23g,13g,78mg,1309.00 mg,35.8 mg,18.00 mcg,0.00 mcg,0.170 mg,0.463 mg,0.425 mg,0.040 mg,1261.00 IU,270.00 mcg,1.00 mcg,248.00 mcg,23.00 mcg,60.00 mcg,0,1.52 mcg,0.124 mg,0.0 mg,259.00 IU,0.84 mg,0.84 mg,3.1 mcg,1360.00 mg,0.037 mg,0.89 mg,33.00 mg,0.048 mg,799.00 mg,283.00 mg,16.2 mcg,2.13 mg,17.12 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.80 g,0.0 g,6.19 g,0.00 g,0.14 g,0.00 g,6.05 g,0.00 g,0.00 g,23.11 g,13.341 g,6.243 g,1.005 g,78.00 mg,0.0 g,6.75 g,0.00 mg,0.00 mg,44.22 g
1293,"Taro, with salt, cooked",100 g,142,0.1g,,0,251.00 mg,21.3 mg,19.00 mcg,0.00 mcg,0.510 mg,0.336 mg,0.028 mg,0.107 mg,84.00 IU,4.00 mcg,0.00 mcg,39.00 mcg,22.00 mcg,0.00 mcg,0,0.00 mcg,0.331 mg,5.0 mg,0.00 IU,2.93 mg,2.93 mg,1.2 mcg,18.00 mg,0.201 mg,0.72 mg,30.00 mg,0.449 mg,76.00 mg,484.00 mg,0.9 mcg,0.27 mg,0.52 g,0.025 g,0.036 g,0.066 g,0.011 g,0.060 g,0.026 g,0.012 g,0,0.019 g,0.038 g,0.023 g,0.007 g,0.028 g,0.021 g,0.032 g,0.024 g,0.008 g,0.019 g,0.028 g,34.60 g,5.1 g,0.49 g,0,0,0,0,0,0,0.11 g,0.023 g,0.009 g,0.046 g,0.00 mg,0.0 g,0.97 g,0.00 mg,0.00 mg,63.80 g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
300,"Gravy, dry, au jus",100 g,313,9.6g,2g,4mg,11588.00 mg,0,81.00 mcg,0.00 mcg,4.087 mg,0.157 mg,0.324 mg,0.472 mg,10.00 IU,0.00 mcg,0,0,0,0,0,0.32 mcg,0.174 mg,1.0 mg,0,0,0,0,140.00 mg,0.120 mg,9.30 mg,56.00 mg,0.268 mg,153.00 mg,279.00 mg,6.2 mcg,0.70 mg,9.20 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,47.49 g,0,0,0,0,0,0,0,0,9.63 g,2.026 g,4.683 g,0.230 g,4.00 mg,0.0 g,30.94 g,0,0,2.74 g
2706,"Fish, prepared, frozen, fish sticks",100 g,277,16g,3.7g,28mg,402.00 mg,42.8 mg,24.00 mcg,0.00 mcg,1.536 mg,0.272 mg,0.116 mg,0.122 mg,18.00 IU,4.00 mcg,0.00 mcg,3.00 mcg,2.00 mcg,42.00 mcg,0,0.96 mcg,0.078 mg,0.0 mg,1.00 IU,6.88 mg,6.88 mg,4.7 mcg,16.00 mg,0.059 mg,0.84 mg,25.00 mg,0.182 mg,191.00 mg,185.00 mg,15.7 mcg,0.42 mg,11.01 g,0.546 g,0.592 g,0.816 g,0,2.317 g,0.474 g,0.230 g,0.002 g,0.429 g,0.852 g,0.795 g,0.296 g,0.470 g,0.783 g,0.490 g,0.372 g,0.122 g,0.286 g,0.500 g,21.66 g,1.5 g,1.65 g,0.10 g,0.00 g,0.30 g,0.12 g,0.58 g,0.56 g,16.23 g,3.733 g,3.193 g,7.824 g,28.00 mg,0.0 g,1.61 g,0.00 mg,0.00 mg,49.50 g
7995,"Beef, raw, select, trimmed to 0"" fat, separabl...",100 g,124,4.1g,1.6g,69mg,85.00 mg,62.2 mg,3.00 mcg,0.00 mcg,6.205 mg,0.770 mg,0.165 mg,0.080 mg,7.00 IU,2.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.95 mcg,0.630 mg,0.0 mg,3.00 IU,0.11 mg,0.11 mg,1.5 mcg,13.00 mg,0.066 mg,2.08 mg,24.00 mg,0.002 mg,218.00 mg,353.00 mg,27.1 mcg,5.30 mg,21.74 g,1.257 g,1.462 g,2.002 g,0.231 g,3.541 g,0.968 g,0.717 g,0.110 g,0.952 g,1.800 g,1.956 g,0.634 g,0.848 g,0.895 g,0.854 g,0.985 g,0.249 g,0.771 g,1.006 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,4.14 g,1.580 g,2.300 g,0.230 g,69.00 mg,0.0 g,1.07 g,0.00 mg,0.00 mg,73.43 g
4158,"KEEBLER, Oatmeal Chocolate Chip Cookies, CHIPS...",100 g,500,24g,9.4g,0,332.00 mg,0,42.00 mcg,0,1.800 mg,0,0.160 mg,0.270 mg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.20 mg,39.00 mg,0,76.00 mg,185.00 mg,0,0.60 mg,6.40 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,65.30 g,4.4 g,32.30 g,0,0,0,0,0,0,24.10 g,9.400 g,6.300 g,6.900 g,0.00 mg,0,0,0,0,3.00 g


In [52]:
nutrition.sample(frac = 0.01).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 88 entries, 3724 to 7184
Data columns (total 76 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   name                         88 non-null     object
 1   serving_size                 88 non-null     object
 2   calories                     88 non-null     int64 
 3   total_fat                    88 non-null     object
 4   saturated_fat                73 non-null     object
 5   cholesterol                  88 non-null     object
 6   sodium                       88 non-null     object
 7   choline                      88 non-null     object
 8   folate                       88 non-null     object
 9   folic_acid                   88 non-null     object
 10  niacin                       88 non-null     object
 11  pantothenic_acid             88 non-null     object
 12  riboflavin                   88 non-null     object
 13  thiamin                      88 

## BONUS - Sampling with Replacement or Weights

The sampling method can get even fancier.
* **Sampling with replacement** refers to the act of placing a record back into the population after it is selected, such that the probability of selecting that item in subsequent selections is unchanged.
* This makes it possible to pick the exact same record multiple times
* This is also known as **bootstrapping** in statistics


Within the `sample()` method, we can use the `replace` parameter to determine whether we sample with replacement. By setting `replace` to `True`, there is a chance to select the same record more than once.

In [53]:
nutrition.sample(n=3, replace = True)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
868,"Veal, raw, external fat only",100 g,503,52g,19g,86mg,89.00 mg,32.0 mg,0.00 mcg,0.00 mcg,2.780 mg,0.260 mg,0.090 mg,0.035 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,1.35 mcg,0.154 mg,0.0 mg,220.00 IU,0.30 mg,0.30 mg,2.8 mcg,31.00 mg,0.045 mg,0.73 mg,19.00 mg,0.008 mg,133.00 mg,107.00 mg,5.2 mcg,0.83 mg,8.85 g,0.526 g,0.520 g,0.763 g,0.100 g,1.399 g,0.455 g,0.321 g,0.060 g,0.436 g,0.704 g,0.729 g,0.206 g,0.357 g,0.369 g,0.331 g,0.387 g,0.090 g,0.282 g,0.489 g,0.89 g,0.0 g,0.00 g,0,0,0,0,0,0,51.60 g,19.082 g,24.815 g,1.631 g,86.00 mg,0.0 g,0.47 g,0.00 mg,0.00 mg,38.19 g
2981,"Chicken, stewed, cooked, meat only, stewing",100 g,237,12g,3.1g,83mg,78.00 mg,0,6.00 mcg,0.00 mcg,6.408 mg,0.861 mg,0.277 mg,0.112 mg,112.00 IU,34.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,0.26 mcg,0.310 mg,0.0 mg,0,0.30 mg,0.30 mg,3.1 mcg,13.00 mg,0.116 mg,1.43 mg,22.00 mg,0.022 mg,204.00 mg,202.00 mg,25.2 mcg,2.06 mg,30.42 g,1.660 g,1.835 g,2.711 g,0.389 g,4.556 g,1.494 g,0.944 g,0,1.606 g,2.283 g,2.585 g,0.842 g,1.207 g,1.251 g,1.046 g,1.285 g,0.356 g,1.027 g,1.509 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,11.89 g,3.100 g,4.050 g,2.830 g,83.00 mg,0.0 g,1.35 g,0.00 mg,0.00 mg,56.35 g
25,"PACE, Green Taco Sauce",100 g,25,0g,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6.25 g,0,6.25 g,0,0,0,0,0,0,0.00 g,0.000 g,0,0,0.00 mg,0,3.05 g,0,0,90.70 g


**Weighted sampling** refers to weighting our records for sampling. The higher the weight, the higher the likelihood of being selected.

Within the `sample()` method we do this using the `weights` parameter.
* If `weights` is not used, all reocrds will have equal probability weighting - that is, there will be equal likelihood of selecting any given record.
* This becomes useful when the indices have meaningful labels that would justify giving more weight to some records and less weight to others.

As an example, we start by creating a Series of weights, indexed by the label of the record that we want to weight in the main DataFrame

In [56]:
weights = pd.Series(data = [10, 10, 10, 1, 2], index=[7, 17, 29, 5, 6])
weights

7     10
17    10
29    10
5      1
6      2
dtype: int64

Now let's use these weights for our sampling. Notice that the higher weighted indices are more likely to be selected.

In [60]:
nutrition.sample(n=3, weights= weights)

Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,lucopene,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
29,"Nuts, dried, pine nuts",100 g,673,68g,4.9g,0,2.00 mg,55.8 mg,34.00 mcg,0.00 mcg,4.387 mg,0.313 mg,0.227 mg,0.364 mg,29.00 IU,1.00 mcg,0.00 mcg,17.00 mcg,0.00 mcg,9.00 mcg,0,0.00 mcg,0.094 mg,0.8 mg,0.00 IU,9.33 mg,9.33 mg,53.9 mcg,16.00 mg,1.324 mg,5.53 mg,251.00 mg,8.802 mg,575.00 mg,597.00 mg,0.7 mcg,6.45 mg,13.69 g,0.684 g,2.413 g,1.303 g,0.289 g,2.926 g,0.691 g,0.341 g,0,0.542 g,0.991 g,0.540 g,0.259 g,0.524 g,0.673 g,0.835 g,0.370 g,0.107 g,0.509 g,0.687 g,13.08 g,3.7 g,3.59 g,0.07 g,0.00 g,0.07 g,0.00 g,0.00 g,3.45 g,68.37 g,4.899 g,18.764 g,34.071 g,0.00 mg,0.0 g,2.59 g,0.00 mg,0.00 mg,2.28 g
7,"Lamb, raw, ground",100 g,282,23g,10g,73mg,59.00 mg,69.3 mg,18.00 mcg,0.00 mcg,5.960 mg,0.650 mg,0.210 mg,0.110 mg,0.00 IU,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0.00 mcg,0,2.31 mcg,0.130 mg,0.0 mg,2.00 IU,0.20 mg,0.20 mg,3.6 mcg,16.00 mg,0.101 mg,1.55 mg,21.00 mg,0.019 mg,157.00 mg,222.00 mg,18.8 mcg,3.41 mg,16.56 g,0.996 g,0.984 g,1.457 g,0.198 g,2.402 g,0.809 g,0.524 g,0,0.799 g,1.288 g,1.462 g,0.425 g,0.674 g,0.694 g,0.615 g,0.709 g,0.193 g,0.556 g,0.893 g,0.00 g,0.0 g,0.00 g,0,0,0,0,0,0,23.41 g,10.190 g,9.600 g,1.850 g,73.00 mg,0.0 g,0.87 g,0.00 mg,0.00 mg,59.47 g
17,"Peppers, raw, jalapeno",100 g,29,0.4g,0.1g,0,3.00 mg,7.5 mg,27.00 mcg,0.00 mcg,1.280 mg,0.315 mg,0.070 mg,0.040 mg,1078.00 IU,54.00 mcg,67.00 mcg,561.00 mcg,105.00 mcg,861.00 mcg,0,0.00 mcg,0.419 mg,118.6 mg,0.00 IU,3.58 mg,3.58 mg,18.5 mcg,12.00 mg,0.046 mg,0.25 mg,15.00 mg,0.097 mg,26.00 mg,248.00 mg,0.4 mcg,0.14 mg,0.91 g,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6.50 g,2.8 g,4.12 g,2.63 g,0.00 g,1.48 g,0.00 g,0.00 g,0.00 g,0.37 g,0.092 g,0.029 g,0.112 g,0.00 mg,0.0 g,0.53 g,0.00 mg,0.00 mg,91.69 g
