# Introduction to GitHub Copilot for Data Science

---
## Intro

This repository contains the source code for the complete workshop. You will follow the step-by-step guide below, completing all the steps while working with data and GitHub Copilot within Codespaces.

> 📝 **Note:**
> This repo is intended to give an introduction to various **GitHub Copilot** features, such as **Copilot Chat** and **inline chat**. Hence the step-by-step guides below contain the general description of what needs to be done, and Copilot Chat or inline chat can support you in generating the necessary commands.
>
> Each step (where applicable) also contains a `Cheatsheet` which has some suggested prompts if you get stuck.

> 💡 Play around with different prompts and see how it affects the accuracy of the GitHub Copilot suggestions. For example, when using inline chat, you can use an additional prompt to refine the response without having to rewrite the whole prompt.

### Data Science Project features

In this workshop, you will be working with CSV data included in this repository, as well as a Jupyter Notebook that starts some analysis of the data. Here are some features of the project you will work with:

1. Consume a CSV dataset and clean it
1. Identify correlations and create polished, exportable results
1. Fast track creating a new analysis notebook from scratch

## Workshop Preparation


This repository is Codespaces-ready and is pre-configured so that you have all dependencies installed including the Visual Studio Code extensions necessary to work with GitHub Copilot, Jupyter Notebooks, and Python:

- GitHub Copilot
- Python extension
- Jupyter extension
- Pre-installed Python dependencies with an activated Virtual Environment

> ❗
> If using this repository in your account or a non-GitHub-Universe organization, you might incur in charges or consumption of your free quota for Codespaces.

### 1. Create a new repository from this template

Progress: [🟢⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪] 1/12 (8%)

⏳ **~2min**

- Click `Use this template` $\rightarrow$ `Create a new repository`
- Set the owner to #TODO: What org for this?
- Give it a name
- Set visibility to `Private`
- Click `Create repository`

### 2. Create a Codespace using the provided template

⏳ **~3min**

- In the newly created repo, click `Code` $\rightarrow$ `Codespaces` $\rightarrow$ `[ellipsis menu]` $\rightarrow$ `New with options` $\rightarrow$ _Ensure that `Dev container configuration` is set to `Default project configuration`_ $\rightarrow$ `Create codespace`
- ❗If you're having problems launching the Codespace then you can also clone the repo and continue from here in your IDE:

    ```sh
    git clone https://github.com/<YOUR_NAME_SPACE>/<YOUR_REPO_NAME>.git
    cd <YOUR_REPO_NAME>
    ```

> 📝 **Note:** There is no need to push changes back to the repo during the workshop

### 3. Verify Python is installed and set correctly

⏳ **~2min**

- Use the command palette to toggle the terminal (search for "Create new terminal")
- Run `which python` and make sure it points to the Virtual Environment (`home/vscode/venv/bin/python`)
- Run `which pip` and ensure that it also points to the Virtual Environment (`home/vscode/venv/bin/pip`)

### 4. Open relevant files

⏳ **~2min**

GitHub Copilot benefits from having context. One way to enhance context is by opening relevant files, else you can add them into your prompt context later.

- Open the `villagers.csv`, `furnature.csv` and `villagers_analysis.ipynb` data files.

## Data Wrangling

### 1. See how much you can learn about the project and the data

⏳ **~5min**

- Open GitHub Copilot Chat
- Use the `@workspace` agent to ask Copilot what is the nature of the data you are going to work with
- Also ask `@workspace` what sort of python packages the project uses, and in what ways

### 2. Review the data files with Data Wrangler

⏳ **~5min**

- Right click on `data/acnh-data/fish.csv`, and open it with Data Wrangler. If it needs a kernel selected, choose the the `venv` Python environment
- Scroll around and look at the state of the data, noting all of the column summaries at the top
- Using the Copilot agent built into Data Wrangler in the panel below the grid, ask it to change the values of the `Rain/Snow Catch Up` column to a bool type with 0 as false
- Export your new data as a csv file `fish-cleaned.csv` via the button just above the grid that says `Export to CSV`

>💡 If you need to repeat the same cleaning process many times, you can export your steps in Data Wrangler to a reuseable notebook or as python code to your clipbord!


<details>
<summary>Cheatsheet</summary>

##### Prompt

```sh
Clean up the rain/snow column so that it is a bool type and No is false
```

</details>

## Initial analysis

### 3. Start a new notebook for the cleaned fish data

⏳ **~5min**

- Open the Copilot chat panel from the icon in the top menubar or the command pallet (View: Toggle Chat)
- In the text box at the bottom click the add context button and select `Files & Folders...`, navigate to your newly cleaned `fish-cleaned.csv`, and select it
- In the bottom left of the chat box, select the `ds-create` mode for the chat.
- Ask Copilot to start a new project for this data in the chat panel text box


<details>
<summary>Cheatsheet</summary>

##### Prompt

```sh
Start a new analysis for this data
```

</details>

### 4. Review and edit visualizations inline in notebooks

⏳ **~3min**

- In your newly created notebook, go to a cell output that has some sort of plot
- Can click and drag it to the Copilot chat panel and ask directly for it to change something about the plot (you can also add it by clicking the `...` to the top right of the cell)


<details>
<summary>Cheatsheet</summary>

##### Prompt

```sh
Can you make all of these plots higher dpi and using colorblind safe color pallets?
```

</details>

## Extending and refining notebooks

### 5. Continue the villager analysis with the `ds-categorical-analysis` agent

⏳ **~5min**

- Open the `villager_analysis.ipynb` file and the copilot chat side panel
- Add the notebook to the context of the chat and switch the agent to the `ds-categorical-analysis` mode.
- Ask Copilot to extend the analysis of the villagers and see if there are any villager traits that are corelated.


<details>
<summary>Cheatsheet</summary>

##### Prompt

```sh
Help me figure out if any properties of the villagers are correlated.
```

</details>

### 6. Prepare graphics for export

⏳ **~5min**

- In a cell generating plots in the notebook, open inline Copilot chat (Ctrl-i / Cmd-i) and ask the agent to help add a method to export the plots in that cell as high resolution pngs

<details>
<summary>Cheatsheet</summary>

##### Prompt

```sh
Export all the plots in this cell as high DPI PNGs 
```

</details>

## Bonus

There are a couple of bonus challenges if you've completed all the tasks and your scripts are in good shape.

### Bonus Challenge 1 - Create a robust CLI tool

Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉  
Optional: [🚀 Turbo Boost! (1/2) 33%]

- Use GitHub Copilot chat with the `@workspace` prefix to convert the project into a CLI with options and a help menu
- Ensure that the prompt specifies no external dependencies and it should only use the standard library

<details>
<summary>Cheatsheet</summary>

#### Prompt

```shell
@workspace I want to convert this project into a CLI with a help menu. Help me do this without using any dependencies, just pure Python standard library
```

#### Expected output

```python
def main():
    parser = argparse.ArgumentParser(description="DataFrame manipulation CLI")
    subparsers = parser.add_subparsers(dest="command")

    subparsers.add_parser("drop_notes", help="Drop the 'notes' column from the DataFrame")
    subparsers.add_parser("select_high_ratings", help="Select rows where the 'rating' column is 90 or higher")
    subparsers.add_parser("drop_and_one_hot_encode_red_wine", help="One-hot encode 'Red Wine' and drop 'variety' column")
    subparsers.add_parser("remove_newlines_carriage_returns", help="Remove newlines and carriage returns from string columns")
    subparsers.add_parser("convert_ratings_to_int", help="Convert the 'rating' column from float to integer")

    args = parser.parse_args()

    # Load your DataFrame here
    df = pd.read_csv('workshop/train.csv')

    if args.command == "drop_notes":
        df = drop_notes(df)
    elif args.command == "select_high_ratings":
        df = select_high_ratings(df)
    elif args.command == "drop_and_one_hot_encode_red_wine":
        df = drop_and_one_hot_encode_red_wine(df)
    elif args.command == "remove_newlines_carriage_returns":
        df = remove_newlines_carriage_returns(df)
    elif args.command == "convert_ratings_to_int":
        df = convert_ratings_to_int(df)
    else:
        parser.print_help()

    # Save the transformed DataFrame
    df.to_csv('workshop/transformed_train.csv', index=False)

if __name__ == "__main__":
    main()
```
</details>

### Bonus Challenge 2 - Document your project

Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉  
Optional: [🚀 Turbo Boost! (1/2) 33%]  
Optional: [🌟 Extra Points! (2/2) 66%]

- Create a new `docs/` folder and a file called `README.md`
- Use GitHub Copilot chat with the `@workspace` prefix to get started documenting your project using Markdown in a README.md
- In your prompt, ask for help documenting the project goals, the nature of the data in the CSV files, and how the CLI works.

<details>
<summary>Cheatsheet</summary>

#### Prompt

```shell
@workspace help me create a good README.md file in Markdown so that I can document this and help others understand how it works and the nature of the data
```

#### Expected output

```markdown
# DataFrame Manipulation CLI

This project provides a command-line interface (CLI) for manipulating a DataFrame using various operations. The CLI is built using the Python standard library and does not require any external dependencies.

...
```
</details>

### Bonus Challenge 3 - Automate the data transformation

Progress: [🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢] 12/12 (100%) 🎉  
Optional: [🚀 Turbo Boost! (1/2) 33%]  
Optional: [🌟 Extra Points! (2/2) 66%]  
Optional: [🏆 Triple Threat! (2/2) 100%]

- Create a new file called `transform-data.yaml` in the `.github/workflows/` directory.
- Use GitHub Copilot chat with the `@workspace` prefix to create a GitHub Action that will transform the data whenever a push or pull request is made to the repository.
- Open a pull request to test the action. If any errors occur, use the GitHub Copilot chat to help you fix them.

<details>
<summary>Cheatsheet</summary>

#### Prompt

```shell
@workspace Generate a GitHub action that transforms the data on every push and pull request 
```

#### Expected output

```markdown
To create a GitHub Action that will transform the data using your CLI, you can create a workflow file in the `.github/workflows` directory. Here is an example of a GitHub Action workflow file named `transform-data.yml`:

    ```yaml
    // Workflow omitted, since this is the final bonus!
    ```

This workflow will:

* Trigger on pushes to the main branch and on manual dispatch.
* Check out the repository.
* Set up Python.
* Install the necessary dependencies (in this case, pandas).
* Run the data transformation using the `run_all` command from your CLI.
* Upload the transformed data as an artifact.

## Clean-up

### 1. Delete your Codespace

⏳ **~1min**

Before deleting, if you wish, you can push your changes. Remember workshop repositories are temporary too.

Go to https://github.com/codespaces and find your current running Codespace and delete it.

## Additional resources

If you want to learn more about using GitHub Copilot, check out these resources:

* [GitHub Copilot Documentation](https://docs.github.com/copilot)
* [VS Code video series: GitHub Copilot](https://www.youtube.com/playlist?list=PLj6YeMhvp2S7rQaCLRrMnzRdkNdKnMVwg)
* [Blog: Best practices for prompting Copilot](http://blog.pamelafox.org/2023/06/best-practices-for-prompting-github.html)

Also check out the [GitHub Foundations learning path](https://learn.microsoft.com/training/paths/github-foundations/) for more resources on GitHub and GitHub Copilot.