---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# AI-Powered Data Analysis

In this notebook, you'll follow a guided walkthrough of how GenAI can be leveraged in the data analysis process. By the end, you'll have an open sandbox where you can experiment in this Python environment with your own AI-generated code. 

:::{attention} `Shift` + `Enter`
Remember that you can move forward in the notebook by hitting `Shift` + `Enter`. This is also how you can run code cells. You can also click in a cell and hit the ▶️ button at the top of the notebook to run code in a cell. 

### Learning to Think Step-by-Step in Data Analysis

The main thing I want you to learn is to tackle data analysis in a, well, analytical way, breaking down the data analysis process into a clear step-by-step process:

1. **Contextualize:** Identify your context. What data are you working with? What's the general topic of what you're trying to do? This initial context is crucial for framing your analysis.

2. **Set a Goal:** What is your general goal? What do you want to achieve with this data analysis? Having a clear objective is key.

3. **Strategize:** With that goal in mind, what's a good strategy for getting there? What tools do you want to use? What approaches or methods? What order do you need to do things in? Planning your approach ensures better outcomes.

4. **Implement:** A carefully articulated strategy should help you in this main process that implements your tools and methods. Many people skip to this step without doing the work beforehand!

5. **Interact:** This is the iterative process of data analysis. You will need to interpret your results, troubleshoot if things aren't going your way, or iterate to get a more refined version of what you want.

6. **Document:** Once you've got what you want, make sure you record—in detail—what you did to get there! This is important both for sharing data and for revisiting your analyses later yourself.

### The Role of AI in Data Analysis

How you interact with AI in the context of data analysis will depend on the AI tools you're using, the data, and your experience, among other factors. A useful way to direct your interactions is to consider different roles that AI can take on in this process. These roles roughly correspond to a mode of interaction and your comfort and experience level with the specific data analysis process you want to engage in.

| Role       | Mode       | Level       |
|:----------:|:----------:|:-----------:|
| Tutor      | Learning   | Novice      |
| Co-pilot   | Exploring  | Intermediate|
| Intern     | Producing  | Expert      |

Take these levels with a grain of salt because you might be experienced or advanced in some areas of data analysis but want to engage with AI as a tutor to learn a new analysis or use a new package.

In the walkthrough, you'll see tabs for these different roles/modes of engaging with AI at each step.

# Step 1: Contextualize

## Set up your GenAI tool

It's important to set up whatever your tool is with the context of your dataset and what you'll be engaging in.

::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-contextualize
```
Act as a Data Analysis Tutor to provide a strong educational foundation for my data analysis project.

Responsibilities:

1. Educate: Explain each step and decision clearly.
2. Guide: Use the provided CSV file for illustrations and answering questions.
3. Respond Patiently: Answer queries with clear, instructive insights, waiting for my cues.
4. Review: Discuss errors or misconceptions post-evaluation.
5. Confirm: Paraphrase my instructions to ensure alignment.

Working Environment: Jupyter Notebook.

Paraphrase my instructions to verify your comprehension.
```
:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-contextualize
```
Serve as a Data Analysis Copilot to navigate my data analysis project.

Responsibilities:

1. Collaborate: Understand data nuances influencing our analysis.
2. Integrate: Use the provided CSV file in our workspace for discussions.
3. Dialogue: Engage in a two-way interaction, pausing for my input.
4. Review: Jointly assess results, considering improvements.
5. Confirm: Echo my directives to ensure synchronization.

Working Environment: Jupyter Notebook.

Echo my objectives back to confirm alignment.

```
:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-contextualize
```
Function as a Data Analysis Intern, executing tasks I delegate.

Responsibilities:

1. Query: Request details influencing task outcomes.
2. Execute: Load and apply the provided CSV file as instructed.
3. Conform: Follow instructions strictly, without introducing new steps.
4. Feedback: Confirm if steps align with objectives post-execution.
5. Repeat: Echo my instructions to demonstrate adherence.

Working Environment: Jupyter Notebook.

Retell my commands to confirm accurate following.
```
:::

And, yes, these were generated and iterated with AI!

::::

:::{attention} Prompts are not magic or universal
:class: dropdown

These suggested prompts are a _starting point_, but you'll have to actually put some thought into what makes sense for you to include in your prompt:
- How big is your context window? (i.e. how much text can you put in there)
- Does your tool have a tendency to give verbose (long, wordy) replies?
- Can you access other settings like the systems prompt etc?

**There are no magic words that will reliably get you a perfect result from an AI chatbot**. Even when you do find something close to a "perfect prompt", it may stop working after the model is updated or some other aspect of the tool's design is changed.

Any of these will affect the best way to get the most use out of your AI tool. This isn't even covering the fact that many IDE's are now incorporating GenAI into their products, meaning you can often talk to GPT-4, Gemini and other AI model's directly from your notebook.

Instead of focusing on optimizing for the current capabilities of the AI tools around you, focus on understanding the _way_ you can delegate and automate aspects of data analysis given the components of that process--i.e. the steps in this exercise!

That said, if you're very interested in this topic, there are courses specifically on _prompt engineering_ and there are frequently new academic and media articles that come out with information on the latest prompt engineering insights.
:::

Depending on your tool of choice, you may not be able to refer to a CSV file or have it run code with that CSV file. Below are some suggested initializing prompts from our readings if you want ideas of what direction to go in for adapting your prompt.

::::{note} Adapting context depending on tool capabilities
:class: dropdown

Below are some example prompts from the Step 0: Context and Setup reading if you need a refresher of what to consider for tool-specific prompting. In particular, whether you can upload files and have access to a cocde interpreter.

:::{seealso} Basic (no code interpreter or file upload)
:class: dropdown

Start the conversation off by specifying your situation and what you’ll be trying to do. I like to prompt with a role I want the GenAI bot to take on.

An example of what that initial prompt might look like if you can't upload your data:

```
Role: Act as a Data Analysis Copilot, providing advice and educational explanations on how to approach my data analysis project.

Responsibilities:

Inquire and Clarify: Ask about details that can impact your advice (e.g., data types, dataframe or variable attributes).

Contextual Understanding: Use the provided pasted data as context for answering my questions.

Here is the data:
<data>
{paste in some data here,
depending on context window,
it may only be a few lines}
</data>

Direct Responses: Answer my questions directly and do not proceed with additional steps until I explicitly ask.

Concise and Educational Explanations: Provide concise explanations, discuss the general consensus on different options, and give clear recommendations on how I should proceed, explaining the reasoning behind your advice.

Verification Guidance: Provide instructions on how I can verify that the code works and achieves the intended goal.

Working Environment: I am using a Jupyter notebook for my work.

Repeat back the instructions I have given to ensure understanding.
```
The last sentence is mainly so that you can separate your first actual query from this role setting stage, and it should give you an idea of how the model is interpreting your instructions. 

The details of this are beyond the scope of this short course, but you can think of it this way: your input determines your output, and priming the conversation by giving context will influence the output.

Feel free to copy this template and adjust as needed.

:::

:::{seealso} File upload (no code interpreter)
:class: dropdown
Same as above but you can just say you attached the file instead of pasting it in, you can reference it as an attached file.


```
Role: Act as a Data Analysis Copilot providing advice and educational explanations on how to approach my data analysis project.

Responsibilities:

1. Inquire and Clarify: Ask about details that can impact your advice (e.g., data types, dataframe or variable attributes).
    
2. Contextual Understanding: Load and use the attached spreadsheet (CSV file) as context for answering my questions.
    
3. Direct Responses: Answer my questions directly and do not proceed with additional steps until I explicitly ask.
    
4. Concise and Educational Explanations: Provide concise explanations, discuss the general consensus on different options, and give clear recommendations on how I should proceed, explaining the reasoning behind your advice.
    
5. Verification Guidance: Provide instructions on how I can verify that the code works and achieves the intended goal.
    

Working Environment: I am using a Jupyter notebook for my work.

Repeat back the instructions I have given to ensure understanding.
```
:::

:::{seealso} Code interpreter
:class: dropdown
This is in some ways the easiest option because you can have it generate and run code for you to do all the work. However, you’ll still need to ask questions to make sure it has done the task correctly. Some of this can be alleviated by priming it to reflect on its answers at the beginning of the conversation with something like: “After running code, revisit my question, critically evaluate your approach, and verify if the output achieved the goal.”

Here’s what the complete first prompt could look like:

```
Role: Act as a Data Analysis Copilot providing advice and educational explanations on how to approach my data analysis project.

Responsibilities:

1. Inquire and Clarify: Ask about details that can impact your advice (e.g., data types, dataframe or variable attributes).
    
2. Contextual Understanding: Load and use the attached CSV file as context for answering my questions.
    
3. Direct Responses: Answer my questions directly and do not proceed with additional steps until I explicitly ask.
    
4. Critical Evaluation: After running code, revisit my question, critically evaluate your approach, and verify if the output achieved the goal.
    
5. Instruction Reiteration: Repeat back the instructions I have given to ensure understanding.
    

Working Environment: I am using a Jupyter notebook for my work.

Repeat back the instructions I have given to ensure understanding.
```
:::

::::

## Understand what you're working with


This is a stage in data analysis where the a user's level really makes a difference in how useful AI can be. All groups can leverage GenAI tools for some combinaton of information retrieval and soundboarding. Even an expert could benefit from this if they're familiar with the subject matter or the types of analysis that are done in a particular field, but maybe they don't know the specific dataset, or at the very least it can help them organize their thoughts.

::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-contextualize

Depending on how unfamiliar you are with the general subject matter and the dataset, you may want to start off very broadly with asking what fields interact and analyze this kind of data, what it is about, what you can learn from it etc. You can ask about how it's formatted and what that means.

Example prompt:
> I don't know anything about the type of data I'm working with here. Can you tell me more about the subject matter and what is represented in the file?

:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-contextualize

You may know a little bit about the data. Maybe you've worked with similar things before and you want to think more creatively. You can use AI to engage in some soundboarding about what is typically done vs. what is cutting edge or what could be an innovative approach.

Example prompt: 
> What kinds of analyses are usually done with this data? And what could be an interesting novel way to look at it?

:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-contextualize

If you're familiar with the type, format, and field of the data, as well as the kinds of analyses that are usually done, this could be a good time to check what your AI-tool knows about this topic as a way to calibrate your expectations. 
> Tell me what you understand about this data.
:::

::::

:::::{danger} Understand the limits of AI tools

You cannot assume that an AI-chatbot is giving you reliable information. 

Depending on the tool, you may be dealing with outdated information (end-date of the training data), hallucinations (these may be worse in some models and tools than others), and simplistic or biased materials.

**AI tools do not give you a license to stop thinking critically**, in fact, they require even more critical thinking _because_ of how convincing the presentation of the information can be, especially without the context of sources that we can evaluate.

Some strategies to consider:
- **Browser-Assisted Validation 🌐**: Use a tool that has a browser built in and check the sources.
- **Retrieval Augmented Verification 🔍**: Utilize tools with Retrieval Augmented Generation (RAG) technology and upload a document you trust as the source.

Remember, AI tools should be used as a springboard for:
- **Orientation 🔦**: They can help steer you in the right direction if you have no idea where to start.
- **Clarification 📝**: They can help you break down complex information and provide quick replies to points of confusion.

However, you shouldn't use them for:
- **Verbatim Acceptance 🕵️**: Do not accept information at face value.
- **Specialized or Debatable Topics 🚫**: Sometimes a topic is nuanced and there's no clear "truth" and your AI may not reflect that.

# Step 2: Set a Goal

What is your general goal? What do you want to achieve with this data analysis? Having a clear objective is key.

::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-setgoal

You can use AI to provide a reflection of the common kinds of data analysis that is done in a field. You can then guide your AI to direct you towards the most appropriate analysis for your experience level.

Example prompt:
> I am deciding on a goal for my data analysis project.
>
> What kind of research questions do people usually answer with this kind of dataset?
>
> Indicate what each option would entail and how appropriate it would be for a novice in data analysis and someone who is unfamiliar with this dataset.

:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-setgoal

You can use AI to reflect on some ideas if you have them. Otherwise, you can just brainstorm some more questions. You can iterate on this step until you find something that piques your interest.

Example prompt:
> I'm thinking of doing a correlation analysis between the variables in this dataset.
>
> What could be some more interesting and unconventional analyses I could do in this project?

:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-setgoal

If you're an expert in some type of analysis or very familiar with the dataset, AI may not be particularly useful to you at this stage. If you plan on using it for the rest of the pipeline however, I would simply record your goal and use it as the context for your strategy goal (see below)

Example (as first part of next prompt):
> My goal is to predict star rating based on available numeric variables and the length of the review.

:::

::::

We'll move forward with the goal of predicting star ratings based on numeric variables and review length.

# Step 3: Strategize

With your goal in mind, develop a strategy to achieve it. Consider the tools, methods, and order of operations required to execute your plan effectively.

::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-strategize

The best way to strategize is to integrate it with learning about each of the steps. You'll want to iterate through each step to make sure you understand it but you can start with a broad overview and then get into the details. 

Example prompt:
> My goal is to predict star rating based on available numeric variables and the length of the review.
>
> Help me understand what steps I need to take in my data analysis pipeline to achieve this, as well as what Python libraries I'll need for this.

Knowing the basics of data analysis, you'll know that you at least want to see steps on wrangling, (statistical) analysis, and visualization.

:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-strategize

You can start with the basics and then iterate on alternatives at various steps.   

Example prompt:
> My goal is to predict star rating based on available numeric variables and the length of the review.
>
> Provide a basic overview of the necessary steps in the data analysis pipeline.
>
> Include different approaches and relevant Python packages for each step, highlighting their pros and cons.

:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-strategize

Starting with the context you articulated from your goal, you can now check in and iterate on all the steps you want to go through to build your data analysis pipeline. You can either iterate, or immediately give guidance about how much detail you want in the strategy.

Example prompt:
> My goal is to predict star rating based on available numeric variables and the length of the review.
>
> Provide an overview of the necessary steps in the data analysis pipeline, including statistical methods, visualizations, and relevant Python libraries.

:::

::::

Take time to make sure you're aligned on the strategy. Some AI tools can be a bit overzealous in answering your questions. They may skip directly to implementing _one_ of the things they suggested, but you're in charge: make sure *you* are making the decisions about what to do because you will be responsible for your choices and you will need to explain them.

# Step 4: Implement

It's tempting to get to this point and say "AI, do the thing!" and walk away. To be honest, it may work. And as tools improve, it may well be possible to do this. 

However, as of this writing, I can tell you that you'll need to supervise the process a lot more closely. Think of this step as your first draft. 

You'll need to break things down to implement your data analysis pipeline.

Two things that influence how many steps you'll need include: 
1. Complexity of your strategy
2. State of your data

If your "project" is as simple as "create a linear regression for predicting X from Y", you can do that in one go. If your data is pristine and in exactly the right format, some tools may spit out exactly what you need. 

But, if you're asking for something more complex and/or your data needs some cleaning to get to where you want to go, AI may trip over the basics in an attempt to provide you with what you requested. 

Break your implementation down into the steps of your data analysis pipeline (based on your strategy) and prompt each step of the way. 

::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-implement

You can also use this as an opportunity to learn by asking questions about why AI suggested using a particular approach and what alternatives would be. It can be useful for learning about common (Python) packages as well. 

Example prompt:
> What functions in the scikit-learn package could I use for the regression?

:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-implement

There's no need to get experimental directly in the implementation. Unless you feel very confident with the type of analysis _and_ the dataset, it's a good idea to take this as an opportunity to build that first unit test and in the next step you can iterate and explore different options for the analyses or visualizations.

Example prompt:
> Create a skeleton for a data analysis pipeline that takes our CSV file as input and outputs a PNG file of the model fit plot for the regression.

:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-implement

Consider this your first benchmarking step. If you plan on using this particular AI tool, this is the point at which you will find out how much it can do with the context it's been given so far. If your goal is to save time, you can try and prompt it to be reflective (you can look into Chain-of-Thought prompting) and build that first version of the full pipeline. 

Example prompt:
> Using the strategy outlined above, create a data analysis pipeline in Python.
>
> Break it down into chunks, stating each step and providing the commented code for that step.
>
> Finally, execute the entire pipeline and display the output.

:::

::::


# Step 5: Interact

Really, the implementation and interaction mode go together. Implementing an AI solution is an iterative process. I think it's more useful to view this step as structured by the nature of your interaction than the level. Here are some ways you might be interacting with your AI:

- **Interpretation:** Many AI tools can help you interpret statistical output. This can be text based output from your Python console, but in some cases, AI can also interpret figures! I've found this especially helpful when I'm using a type of plot I don't usually work with. Or if there are many components and it's hard to keep track of what it means together.
- **Refining and Scaling:** Once you have a functional pipeline, you can start playing around with the specifics. Do you want your output to be formatted in a particular way? Do you want to run this analysis on multiple files that are similarly formatted?
- **Troubleshooting:** Of course, you may run into trouble. Maybe your first draft didn't even work. In this case, you can often try to simply "regenerate" (some tools literally have some variant of this with some kind of 🔄 icon). Or, more often, you'll need to ask specific questions and dig into the error outputs.

:::{error} Beware the doom loop!

I speak from experience when I say that I've gone down the AI rabbit hole with troubleshooting before. You may be getting an error and start prompting AI to figure it out. 

**You need to know when to stop.**

There are any number of reasons that it may not be working: 
- The model doesn't have the necessary knowledge.
- Something about your computer's configuration is different than the code interpreter of the AI tool.
- The AI tool's code interpreter is down.

This is why knowing something about coding and data analysis is useful, even in the age of AI. If you have no idea what's going on, you'll be running the same prompt over and over like your at the slot machines gambling to get the winning prize (output). 

Some great advice I once got is: **set a time limit**. 

If you run into a troubleshooting loop, set a time limit (15 mins, 30 mins, etc). And once the time is up, if you haven't figured it out _with_ AI, step away from your computer and the problem. 

You can come back to it later with fresh eyes and try to figure out what's going on or take on a different strategy.
:::


# Step 6: Document
As you do your data analysis with your AI tool of choice, remember to document everything!

Don't assume AI will remember what _you_ want it to remember. Keep a notebook. This is good practice for data analysis in general, but also important if you want to benchmark the performance of various models. 


::::{tab-set}

:::{tab-item} Tutor (Learning Mode)
:sync: tab-tutor-document

You can create a record of what it is that you did each step of the way and use it to review and document your own learning.

Example prompt:
> Document in detail each step we took in this data analysis project. Make sure all code is well-documented and replicable.

:::

:::{tab-item} Co-pilot (Exploring Mode)
:sync: tab-copilot-document

If you experimented with a lot of different approaches, you may want to use this as an opportunity to note, almost like a journal, what did and what didn't work. 

Example prompt:
> Document in detail each step we took in this data analysis project, focusing on the different options we tried and what did and did not work. Make sure all code is well-documented and replicable. 

:::

:::{tab-item} Intern (Producing Mode)
:sync: tab-intern-document

If you're planning to use AI to gain efficiency in processing data at scale, consider using this as a benchmarking exercise. Now that you've gone through this process with this particular tool with this specific configuration, you can see how other tools and models perform and adjust your decisions about what to use in the future. 

Simply save the prompts (ideally save the conversation as well) with something like:

> Repeat back to me all the prompts I've given you in this conversation.

:::

::::


:::{tip} Tips for documentation

A nice hack I've found is to simply ask the AI to output everything we've done directly as a Jupyter Notebook or Markdown file. To make this process easier, consider:

- **Providing directory structure.** If you're loading files, you'll want to make sure the file paths are correct. You can just give that to the AI in the prompt to generate it.

- **Outputting as plaintext first.** Sometimes I've had trouble creating the file directly on ChatGPT, so I output as txt in code block, paste and save in a plaintext editor (e.g. TextEdit on MacOS) and change the extension.

Example prompt:
> Create a Jupyter notebook with a detailed walkthrough of the steps taken to achieve our results. Ensure all code is well-commented for easy replication. Instead of saving the file as a .ipynb, create it in a code block that I can paste into a text file.

:::

# DIY with AI

Your turn! Experiment with the provided datasets in this notebook. Toggle the options below to get code with the correct dataset paths.

Alternatively, download the zip file of this lab to use in your own Python environment.

Here are a few suggestions about directions to go in with the available datasets:

  
:::{seealso} Play with the Yelp dataset
:class: dropdown

Paste this code into the code block below to get path to the files.

```
# Run the setup script (this has helper functions to find the datasets)
%run -i ./utils/setup_utils.py
output, parent_directory = helpers.get_overview()

# Use the search_datasets function with the parent directory
search_string = "Yelp"  # Example search string
dataset_matches = helpers.search_datasets(Path(parent_directory), search_string)

# Print the search results
helpers.print_search_results(dataset_matches)

```

:::

If you're looking for something straightforward to test out your skills with, this is a pretty simple dataset we've curated for that purpose.

**Suggested direction:** Do a sentiment analysis of the Yelp reviews and create a pie chart of the positive, neutral and negative reviews (we have a subset that we provide as a CSV, learn more about the full dataset [here](https://www.yelp.com/dataset/documentation/main).

:::{seealso} Play with the Kaggle datasets
:class: dropdown

Paste this code into the code block below to get path to the files.

```
# Run the setup script (this has helper functions to find the datasets)
%run -i ./utils/setup_utils.py
output, parent_directory = helpers.get_overview()

# Use the search_datasets function with the parent directory
search_string = "Kaggle"  # Example search string
dataset_matches = helpers.search_datasets(Path(parent_directory), search_string)

# Print the search results
helpers.print_search_results(dataset_matches)

```

:::

For a more intermediate challenge, we have two sizeable datasets from Kaggle.

**Suggested direction 1:** Practice data wrangling with the Kaggle dataset `shopping_behavior.csv`, we made it specifically to contain some obvious missing and duplicate data as well as datatype conversion issues (metadata [here](https://www.kaggle.com/datasets/zeesolver/consumer-behavior-and-shopping-habits-dataset)).

**Suggested direction 2:**  The `bank_churners.csv` file is another Kaggle dataset (metadata [here](https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers)).

:::{seealso} Play with the NOAA datasets
:class: dropdown

Paste this code into the code block below to get path to the files.

```
# Run the setup script (this has helper functions to find the datasets)
%run -i ./utils/setup_utils.py
output, parent_directory = helpers.get_overview()

# Use the search_datasets function with the parent directory
search_string = "NOAA"  # Example search string
dataset_matches = helpers.search_datasets(Path(parent_directory), search_string)

# Print the search results
helpers.print_search_results(dataset_matches)

```

:::

This is one of the more challenging datasets. You'll have to take care to wrangle things properly and reference the metadata to know what each variable means. 

**Suggested direction:** Merge the NOAA datasets and explore the difference in weather between one Russian and two US locations (metadata in folder `./Datasets/NOAA_Weather`) and [online](https://www.ncei.noaa.gov/data/global-hourly/doc/).

## AI-Sandbox

If you want to keep playing around in this environment (i.e. this Notebook), you may want to pass on the context of your setup to your AI, in which case, you can run this code right below to get the necessary information (click `Shift` + `Enter` to run the cell). 

In [None]:
# Run this and give it to your AI as context
%run -i ./utils/setup_utils.py
# Print overview of environment and configuration
output, parent_directory = helpers.get_overview()
print(output)


### Import your data
Figure out the relevant filepaths and import the data into this Notebook.

In [None]:
# Pick a dataset and copy the code from one of the toggles above and paste it below.
# This will tell you where all the relevant data is.






Now, you can import the data and play around with it.

In [None]:
# Your code goes here:









### Data Wrangling
Use the space below to run your wrangling.

In [None]:
# Your code goes here:









### (Statistical) Analysis 
A separate cell for you to run any statistical analyses you want.

In [None]:
# Your code goes here:









### Data Visualization 
An additional cell for any data visualization you want to run.

In [None]:
# Your code goes here:







