# Exploratory Data Analysis
## Learning Goals
- Load data from an external source into a pandas DataFrame
- Use pandas methods to inspect and understand the structure and quality of a dataset
- Generate visualizations to explore patterns, distributions, and relationships in the data


In [None]:
!pip install pandas numpy seaborn matplotlib

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Loading in Data

In [None]:
fire_json = """
{
  "fires": [
    {
      "FireYear": 2015,
      "FireName": "Bass 497",
      "EstTotalAcres": 3.2,
      "HumanOrLightning": "Human",
      "County": "Klamath"
    },
    {
      "FireYear": 2022,
      "FireName": "Hay Fire",
      "EstTotalAcres": 0.2,
      "HumanOrLightning": "Human",
      "County": "Klamath"
    },
    {
      "FireYear": 2000,
      "FireName": "Slick Ear #2",
      "EstTotalAcres": 0.75,
      "HumanOrLightning": "Lightning",
      "County": "Grant"
    }
  ]
}
"""




## Your Turn: Lodaing Data
- Go to https://data.gov/ and pick a csv data set, then load it in and save it to a vairable for use later. 

# Exploratory Data Analysis (EDA)

- Explore and summarize data to:
  - Gain insights  
  - Discover relationships between variables  
  - Identify outliers and missing values  
  - Recognize patterns and trends  
  - Inform the selection of appropriate modeling techniques  


In [None]:
# Head returns the first five rows of a DataFrame


In [None]:
#Returns the last 5 rows


In [None]:
#Returns a specific number of rows


In [None]:
# Returns a tuple with the total number of rows and columns


## Types of Data

- **Categorical**: Data that falls into one of a limited number of distinct groups.
  - **Nominal**: Categories with no inherent order (e.g., vehicle types).
  - **Ordinal**: Categories with a meaningful order but no numeric difference between them (e.g., agree, strongly agree).

- **Numeric / Continuous**: Data represented by numbers (integers or floats) where mathematical operations make sense.
  - **Interval**: Ordered numeric data where differences between values are meaningful, but there is no true zero (e.g., temperature in Celsius).
  - **Ratio**: Numeric data with a true zero, where both differences and ratios are meaningful (e.g., height, weight).

- **Dates**: Can be treated as either categorical or numeric depending on context.
  - **Categorical**: When broken into components such as day of the week, month, or year.
  - **Numeric**: When measured as time elapsed from a fixed reference point (e.g., days since 2025-01-01).


In [None]:
#Getting Features by Type


In [None]:
# Categorical Exploration 




In [None]:
# Numeric Exploration


In [None]:
# Describes the Categorical Data


In [None]:
# Describes the numeric data
# count - The number of not-empty values.
# mean - The average (mean) value.
# std - The standard deviation.
# min - the minimum value.
# 25% - The 25% percentile.
# 50% - The 50% percentile (median value).
# 75% - The 75% percentile.
# max - the maximum value.


In [None]:
# Missing Values 


## Your Turn: Exploring the Shape of Your Data

Use the methods discussed above to create a short exploratory report of your dataset.

Your report should include:

1. **Basic Structure**
   - The number of rows and columns  
   - A preview of the first few rows  
   - A list of all column names  

2. **Data Types**
   - Identify which columns are numeric  
   - Identify which columns are categorical  

3. **Summary Statistics**
   - Summarize the numeric columns  
   - Summarize the categorical columns  

4. **Missing Values**
   - Determine whether any columns contain missing data  
   - Identify which columns have the most missing values  

5. **Reflection**
   Note: You will share this with the class later
   Write 3–5 sentences describing:
   - What you learned about the dataset  
   - Any patterns or irregularities you noticed  


## Matplotlib Crash Course

In [None]:
#docs https://matplotlib.org/
# Size your graph

In [None]:
# Pandas uses matplotlib 


In [None]:
#Histagrams for numeric data


In [None]:
#Bar charts for categorical data


## Your Turn: Visualizing Your Data

Create visualizations to explore patterns and relationships in your dataset.

### 1. Histograms (Numeric Data)
- Identify all numeric columns.
- Create a histogram for each numeric feature.

### 2. Bar Charts (Categorical Data)
- Identify all categorical columns.
- Create a bar chart showing the frequency of each category.

### 3. Scatter Plots
- Select at least two numeric features.
- Create simple scatter plots to explore relationships.

### 4. Reflection:  Note: You will share this with the class later
Write 3–5 sentences answering:
- What patterns did you notice?
- Which variables appear related?
- Were any variables heavily skewed?
- Did you observe outliers?


Lets share our observations! 