# [Exploring the Banana Index: Environmental Impact of Food Production](https://eds-217-essential-python.github.io/course-materials/eod-practice/eod-day5.html)

In this activity, you’ll explore the “Banana Index” dataset, which compares the environmental impact of various food products to that of a banana. These data were developed by the Economist magazine in 2023 and they posted their data to github for us to use. This exercise will help you practice working with pandas DataFrames, data manipulation, and visualization skills while learning about the environmental impacts of food production.

## Setup

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
url = "https://github.com/TheEconomist/banana-index-data/releases/download/1.0/bananaindex.csv"
df = pd.read_csv(url)

In [None]:
# Display the first few rows:
print(df.head())

In [None]:
# Display the dataframe info:
print(df.info())

## 1. Data Preparation

a. Set the index of the DataFrame to be the ‘entity’ column.

In [None]:
df.set_index('entity', inplace=True)

b .Remove the ‘year’, ‘Banana values’, ‘type’, ‘Unnamed: 16’, and ‘Chart?’ columns.

In [None]:
df.drop(columns=['year', 'Banana values', 'type', 'Unnamed: 16', 'Chart?'], inplace = True)

c. Display the first few rows of the modified DataFrame.

In [None]:
df.head()

## Exploring Banana Scores

a. For each of the pre-computed banana score columns (kg, calories, and protein), show the 10 highest-scoring food products.

In [None]:
df[['Bananas index (kg)', 'Bananas index (1000 kcalories)', 'Bananas index (100g protein)']].sort_values(by = 'Bananas index (kg)', ascending = False).head(10)

b. Edit the function below so that is returns the top 10 scores for a given column:

```python
def return_top_ten(df, column):
    """ Return the top 10 values of a column """
    pass
```

In [None]:
def return_top_ten(df, column):
    return df[[column]].sort_values(by = column, ascending = False).head(10)

Use your function to display the results for each of the three Banana index columns.

In [None]:
return_top_ten(df, 'Bananas index (kg)')
return_top_ten(df, 'Bananas index (1000 kcalories)')
return_top_ten(df, 'Bananas index (100g protein)')

## 3. Common High-Scoring Foods

Identify which foods, if any, appear in the top 10 for all three banana score lists (kg, calories, and protein).

In [None]:
top_10_kg = return_top_ten(df, 'Bananas index (kg)')
top_10_cal = return_top_ten(df, 'Bananas index (1000 kcalories)')
top_10_protein = return_top_ten(df, 'Bananas index (100g protein)')

this was one way to do it. another would be:
```python
banana_df = df.filter(like='Bananas')
print(banana_df.head())

list_of_top_10 = []
for column in banana_df.columns:
    top_10 = return_top_ten(df, column)
    list_of_top_10.append(set(top_10.index))

print(list_of_top_10)
```

and then pass `set.intersection` the list_of_top_10

In [None]:
set(top_10_kg.index)

In [None]:
in_all_three = set.intersection(set(top_10_kg.index), set(top_10_cal.index), set(top_10_protein.index))
print(in_all_three)

## 4. Land Use Analysis

a. Create a new column named ‘Bananas index (land use 1000 kcal)’, calculating that food item’s use of land for every 1,000 kcal in comparison to a banana.

In [None]:
banana_row = df.filter(like = 'Bananas', axis = 0)
print(banana_row)

# banana land use 1000 kcal = 2.717877. to find the comparison, divide entity land use by banana land use (?)

df['Bananas index (land use 1000 kcal)'] = (df['land_use_1000kcal'] / 2.717877)
print(df[['Bananas index (land use 1000 kcal)', 'land_use_1000kcal']])

b. Display the 10 foods with the highest land use score.

In [None]:
top_10_land = return_top_ten(df, 'Bananas index (land use 1000 kcal)')

c. Compare this list with the previous top 10 lists. Are there any common foods?

In [None]:
in_all_four = set.intersection(set(top_10_kg.index), set(top_10_cal.index), set(top_10_protein.index), set(top_10_land.index))
print(in_all_four)

## 5. Cheese Analysis
Identify the type of cheese with the highest banana score per 1,000 kcal. How does it compare to other cheeses in the dataset?

In [None]:
cheeses = df.filter(like = 'cheese', axis=0)


return_top_ten(cheeses, 'Bananas index (land use 1000 kcal)')
# cottage cheese goes crazy