# Warm-Up

Start by running the usual Library Import cell:

In [1]:
import matplotlib
%matplotlib inline
import numpy as np
import pandas as pd

## Load URLs from CSV

Run the following line to download the `urls.csv` file to your folder.

In [2]:
!curl https://wagon-public-datasets.s3.amazonaws.com/02-Data-Toolkit/02-Data-Sourcing/urls.csv > urls.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    77  100    77    0     0    102      0 --:--:-- --:--:-- --:--:--   102


Then load this CSV in a `urls_df` dataframe using Pandas:

In [3]:
urls_df = pd.read_csv('urls.csv')
urls_df

Unnamed: 0,url
0,https://www.github.com
1,https://stackoverflow.com/questions/tagged/python


## Enrich Dataset with an API

Let's use the `fetch_metadata` function that we just implemented in the `opengraph.py` file.

First let's import it and make sure that it works in the Notebook. 

1. Write the relevant `from ... import ...` line
1. Call the `fetch_metadata` on a URL of your choice. You can write `fetch_` then `<TAB>` to autocomplete, then `<SHIFT> + <TAB>` to view the Docstring from your Python file!

In [4]:
from opengraph import fetch_metadata
fetch_metadata('https://www.github.com')

{'image': 'https://github.githubassets.com/images/modules/site/social-cards/campaign-social.png',
 'image:alt': 'GitHub is where over 94 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and feat...',
 'site_name': 'GitHub',
 'type': 'object',
 'title': 'GitHub: Let’s build from here',
 'url': 'https://github.com/',
 'description': 'GitHub is where over 94 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and feat...',
 'image:type': 'image/png',
 'image:width': '1200',
 'image:height': '630'}

Iterate over the `urls_df` dataframe to add `title` and `description` columns for each URL

<details>
  <summary>🆘 Hint</summary>

  <p>Have a look at today's Lecture, you can start by copy/pasting what we did for <code>tracks_df</code> and adapt the code</p>

</details>

In [5]:
urls_df['title'] = ''
urls_df['description']= ''

for index, row in urls_df.iterrows():
    metadata = fetch_metadata(row['url'])
    urls_df.loc[index, 'title'] = metadata['title']
    urls_df.loc[index, 'description'] = metadata['description']
    
urls_df

Unnamed: 0,url,title,description
0,https://www.github.com,GitHub: Let’s build from here,GitHub is where over 94 million developers sha...
1,https://stackoverflow.com/questions/tagged/python,Newest 'python' Questions,Stack Overflow | The World’s Largest Online Co...


## Check your code!

Run the cell below to check your code:

In [6]:
from nbresult import ChallengeResult

result = ChallengeResult('warmup',
    df_columns=urls_df.columns,
)
result.write()
print(result.check())


platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/ninaad/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/ninaad/code/ninzyyy/data-opengraph_api/tests
plugins: anyio-3.6.2, asyncio-0.19.0
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_warmup.py::TestWarmup::test_dataframe_has_new_columns [32mPASSED[0m[32m        [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/warmup.pickle

[32mgit[39m commit -m [33m'Completed warmup step'[39m

[32mgit[39m push origin master



In [7]:
!git add tests/warmup.pickle
!git commit -m 'Completed warmpup step'
!git push origin master

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Warmup.ipynb[m
	[31mmodified:   opengraph.py[m

no changes added to commit (use "git add" and/or "git commit -a")
Everything up-to-date


## (Optional) Autoreload

Today's Lecture introduced you to the usefulness of [`autoreload`](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html) in the notebook, let's experiment with it!

Run the following cell, it should return `True` if your method returns `{}` when a website is not found.

In [9]:
%load_ext autoreload
%autoreload 2
fetch_metadata("https://www.a.com") == {}

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


False

Open VS Code and change the behavior of the function, to make it return an empty string `""` rather than `{}` if the HTTP response is something else than `200`. Save your file on the drive, and re-run the cell above.

Do you see something changing? No? That's normal! The first version of the `fetch_metadata` code is stored in the Notebook Kernel.

---

OK, let's change back the `fetch_metadata` code in VS Code back to `{}`.

Then, add the following two lines to your first Notebook code cell:

```python
%load_ext autoreload
%autoreload 2
```

Then in the menu bar, go to `Kernel` > `Restart & Run all`.

---

Now that autoreload is enabled, go to VS Code, and once again change the behavior so that it returns an empty string. Re-run the code cell above. Do you get `False`? Good! That means that the Notebook is now monitoring changes to the files imported, like `opengraph.py`, and will reload them if the code within them changes!

### Conclusion

You might find this confusing, jumping through Notebook and VS Code, don't worry you will get used to it. The Notebook is a perfect tool to experiment, to keep notes, to get graphical output of the data, etc. Still, the end goal of a Data Team is to **ship** something (a product, an API, a model, etc.), so productizing the code and refactoring it _out_ of the Notebook into proper Python modules is a critical skill that you will learn!