# 🔎 Exploratory Data Analysis (EDA) Project
**Author:** Kristine Steele
**Date:** September 29, 2025
**Purpose:**
This notebook demonstrates exploratory data analysis using Python libraries such as pandas, matplotlib, and seaborn. The goal is to analyze, visualize, and gain insights from structured data.

In [1]:
# Import required libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

## 🥗 Inspect Nutrition Data
Display the first 10 rows, shape, and data types of the nutrition dataset.

In [10]:
## Load your CSV file into a DataFrame
df = pd.read_csv('data/nutrition_values.csv')

# Inspect the first rows of the DataFrame
df_nutrition = pd.read_csv('data/nutrition_values.csv')
print(df_nutrition.head(10))
print(df_nutrition.shape)
print(df_nutrition.dtypes)


           Name  Calories (per 100g)  Carbohydrates (g)  Protein (g)  Fat (g)  \
0      Amaranth                   23                4.0          2.1      0.3   
1         Apple                   52               14.0          0.3      0.2   
2       Apricot                   48               11.0          1.4      0.4   
3     Asparagus                   20                3.7          2.2      0.2   
4       Avocado                  160                8.5          2.0     14.7   
5        Banana                   89               23.0          1.3      0.3   
6         Beans                   31                7.0          1.8      0.2   
7      Beetroot                   43                9.6          1.6      0.2   
8   Bell Pepper                   20                4.6          0.9      0.2   
9  Bitter Gourd                   19                4.3          0.8      0.2   

   Fiber (g)  Vitamin C (mg)  Zinc (mg)  Potassium (mg)  Iron (mg)  \
0        2.0            70.0       0.9

## Nutrition Data Summary Statistics
Show summary statistics for each column in the nutrition dataset.

In [14]:
print(df_nutrition.describe())

       Calories (per 100g)  Carbohydrates (g)  Protein (g)    Fat (g)  \
count            77.000000          77.000000    77.000000  77.000000   
mean             68.844156          13.505974     1.686364   1.718961   
std              79.861404          11.661530     2.183808   6.478875   
min              13.000000           2.870000     0.200000   0.100000   
25%              26.000000           6.040000     0.800000   0.200000   
50%              48.000000          11.000000     1.100000   0.300000   
75%              77.000000          16.500000     1.900000   0.500000   
max             553.000000          74.970000    18.220000  43.850000   

       Fiber (g)  Vitamin C (mg)  Zinc (mg)  Potassium (mg)  Iron (mg)  \
count  77.000000       77.000000  77.000000       77.000000  77.000000   
mean    2.836364       32.116883   0.397403      278.454545   0.888182   
std     1.966958       48.551572   0.670766      158.338457   1.363415   
min     0.400000        0.500000   0.020000   

## 🍎 Inspect Fruit Prices 2022
Display the first 10 rows, shape, and data types of the fruit prices dataset.

In [13]:
## Load your CSV file into a DataFrame
df = pd.read_csv('data/Fruit-Prices-2022.csv')

# Inspect the first rows of the DataFrame
df_fruit = pd.read_csv('data/Fruit-Prices-2022.csv')
print(df_fruit.head(10))
print(df_fruit.shape)
print(df_fruit.dtypes)


                                Fruit    Form  RetailPrice RetailPriceUnit  \
0                              Apples   Fresh       1.8541       per pound   
1                  Apples, applesauce  Canned       1.1705       per pound   
2              Apples, ready-to-drink   Juice       0.8699        per pint   
3          Apples, frozen concentrate   Juice       0.6086        per pint   
4                            Apricots   Fresh       3.6162       per pound   
5           Apricots, packed in juice  Canned       1.8645       per pound   
6  Apricots, packed in syrup or water  Canned       2.2362       per pound   
7                            Apricots   Dried       7.6611       per pound   
8                             Bananas   Fresh       0.5971       per pound   
9                      Berries, mixed  Frozen       4.2673       per pound   

   Yield  CupEquivalentSize CupEquivalentUnit  CupEquivalentPrice  
0   0.90             0.2425            pounds              0.4996  
1   1

## Fruit Prices 2022 Summary Statistics
Show summary statistics for each column in the fruit prices dataset.

In [15]:
print(df_fruit.describe())

       RetailPrice      Yield  CupEquivalentSize  CupEquivalentPrice
count    62.000000  62.000000          62.000000           62.000000
mean      2.994571   0.876129           1.704984            1.065056
std       2.269393   0.174979           2.949262            0.578325
min       0.382000   0.460000           0.123200            0.242900
25%       1.364225   0.722500           0.322450            0.639300
50%       2.159250   0.980000           0.363800            1.008250
75%       4.116525   1.000000           0.540100            1.353475
max      10.303500   1.000000           8.000000            3.555800


## 🥦 Inspect Vegetable Prices 2022
Display the first 10 rows, shape, and data types of the vegetable prices dataset.

In [12]:
## Load your CSV file into a DataFrame
df = pd.read_csv('data/Vegetable-Prices-2022.csv')

# Inspect the first rows of the DataFrame
df_vegetable = pd.read_csv('data/Vegetable-Prices-2022.csv')
print(df_vegetable.head(10))
print(df_vegetable.shape)
print(df_vegetable.dtypes)


      Vegetable    Form  RetailPrice RetailPriceUnit   Yield  \
0  Acorn squash   Fresh       1.2136       per pound  0.4586   
1     Artichoke   Fresh       2.4703       per pound  0.3750   
2     Artichoke  Canned       3.4498       per pound  0.6500   
3     Asparagus   Fresh       2.9531       per pound  0.4938   
4     Asparagus  Canned       3.4328       per pound  0.6500   
5     Asparagus  Frozen       6.8212       per pound  1.0335   
6      Avocados   Fresh       2.6737       per pound  0.7408   
7         Beets  Canned       1.1431       per pound  0.6500   
8   Black beans  Canned       1.2387       per pound  0.6500   
9   Black beans   Dried       1.5250       per pound  2.4692   

   CupEquivalentSize CupEquivalentUnit  CupEquivalentPrice  
0             0.4519            pounds              1.1961  
1             0.3858            pounds              2.5415  
2             0.3858            pounds              2.0476  
3             0.3968            pounds             

## Vegetable Prices 2022 Summary Statistics
Show summary statistics for each column in the vegetable prices dataset.

In [16]:
print(df_vegetable.describe())

       RetailPrice      Yield  CupEquivalentSize  CupEquivalentPrice
count    93.000000  93.000000          93.000000           93.000000
mean      2.107941   0.959997           0.338635            0.828323
std       1.082376   0.494943           0.066544            0.508993
min       0.797000   0.375000           0.154300            0.221500
25%       1.269100   0.650000           0.297600            0.539700
50%       1.843300   0.881800           0.341700            0.701800
75%       2.494100   0.970200           0.385800            0.937300
max       6.821200   2.539700           0.540100            2.619100
