# 3 Step Process to Analyzing Your Data (General case)
1. Define your goal, why this matters, your target variable, initial hypotheses (write these down)
2. Get to know your data: obtain a 10,000 ft view of your data before diving in (write down what you learn/discover)
3. Answer your initial hypotheses questions with visuals and statistics (write down takeaways as you go)

## Step 1: Define your goal. What's the impact? Who benefits? Why does this matter? 
- Take *lots* of written or typed notes here. 
- What, exactly, are you seeking to understand?
- Define your target variable. Does one exist in your dataset?
- Define your initial hypotheses: literally write them down
    - Are the stakeholders hoping to confirm or deny something already? That's a good start.
    - What initial hunches do you have? How could you confirm or deny them?
    - Are there industry based hunches you can confirm or deny?
    - If you have data from your industry, how does your dataset compare?
    - Based on the columns you have, do you see anything you *know* you need to look at?
- Define your Minimum Viable Product (MVP)
    - How do you know when you're done?
    - How do you know when you've got something to deliver?
    - If something seems interesting but out of scope, add it to your backlog and get back to the heart of the matter

### How to Generate Initial Hypotheses  

Think and write down your thoughts

![](hypothesis_generation.png)

### Defining Your Minimum Viable Product (MVP)

The real MVP is producing insight and takeaways stakeholders can use to improve their decision making

What would a Minimum Viable Product look like? You likely won't have 6 months to produce answers.

![](mvp.png)

## Step 2: Get to know your data at a high level
- *Dump out all your legos and take inventory of what you have*
- Take lots of written or typed notes as you move through these steps)
- Hunt down a data dictionary that explains what each column of your data is or represents.
    - This may mean talking to people
    - This may mean having coffee w/ someone from accounting to pick their brain
- Determine if there's any columns needing cleaning or a change in data type
- Hunt for any nulls or missing data. Write down how you handle them. There's no one right answer.
- What's the distribution of your target variable? What about your most interesting variables?

![](storytelling.webp)

## Step 3: Answer your initial hypotheses with visuals and statistics
- Always write down your takeaways as you learn or reveal them
- Use visuals, descriptive stats, and inferential stats to answer your hypotheses
- Start with one hypothesis at a time.
- Visualize and get stats on the population
- Start making subgroups to compare subgroups to the population and subgroups to subgroups
- Focus on getting the biggest bang for your buck rather than counting toenails on an ant.

![](descriptive-and-inferential-statistics.jpeg)

### Explore Relationships Between Variables and the Target Variable
![](hypothesis.jpeg)

# 3 Step Process to Data Analysis (Specific case with HR Attrition Data)
1. Define your goal, why this matters, your target variable, initial hypotheses (write these down)
2. Get to know your data: obtain a 10,000 ft view of your data before diving in (write down what you learn/discover)
3. Answer your initial hypotheses questions with visuals and statistics (write down takeaways as you go)

In [14]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("attrition.csv")

In [7]:
df.head(2).T

Unnamed: 0,0,1
Age,41,49
Attrition,Yes,No
BusinessTravel,Travel_Rarely,Travel_Frequently
DailyRate,1102,279
Department,Sales,Research & Development
DistanceFromHome,1,8
Education,2,1
EducationField,Life Sciences,Life Sciences
EmployeeCount,1,1
EmployeeNumber,1,2
