# Hands-On Exercise 1.1:
# Exploring Jupyter and Loading Data In Python

***

## Objectives

### In this exercise, you will explore the Jupyter environment and the different types of data that can be loaded into Python for processing.

### Overview

Jupyter Notebooks allow the creation and sharing of documents that contain live code, equations, visualizations and narrative text. This exercise will allow you to explore that environment as well as load .csv files and web pages (webscraping) as data sources.<br><br>

**Pre-step: Execute the following cell in order to suppress warning messages**

In [None]:
import warnings
warnings.filterwarnings("ignore")

**Major Step 1: Reviewing the Jupyter environment**


1. ❏ Start jupyter_notebook from the desktop. A jupyter notebook window will open and you will see some log
output. A browser will then open with the following URL: http://localhost:8888/tree/#notebooks showing a folder of files similar to the following: <br>
 

![image.png](attachment:image.png)

2. ❏ To start a new notebook, click on **New** towards the top right of the dashboard. On the dropdown that appears, select **Python 3** as your kernel. A new tab will appear in the browser. <br>
 <br>
3. ❏ Rename the new Notebook from *Untitled* to **MyFirstNotebook** by selecting File->Rename or double-clicking the name, to the right of the Jupyter icon at the top<br><br>

4. ❏ On the first cell, try some simple computations such as the following: <br><br>
```python
print(3+2)
```
*Hint: Python is case sensitive*<br><br>
It should look like the cell below

In [None]:
print(3+2)

5. ❏ Execute the cell by pressing Ctrl+Enter and notice a number appearing within the [ ] to the left of the cell once it has executed. Try re-running the cell and notice the number change<br><br>

6. ❏ Try printing some text such as the following in a new cell:<br><br>
```python
print('Hello' + ' 1 ' + 'World')
```
It should look like the cell below

In [None]:
print('Hello' + ' 1 ' + 'World')

7. ❏ Experiment with inserting cells above and below previous cells containing similar Python commands (i.e. calculations or printed text), and note how cells can be executed independently or how all cells in the notebook can be executed, depending on your requirements<br><br>
*Hint: Menu->Insert->Insert Cell Above*<br><br>

8. ❏ Experiment with clearing the output of cells and re-executing them<br><br>
*Hint: <br>
Menu->Cell->Current Outputs->Clear or <br>
Menu->Cell->All Output->Clear*<br><br>

9. ❏ Insert a cell and change its type to *Markdown*.  Enter some plain text in the cell and then execute the cell. Note the difference in indentation of the output<br><br>

10. ❏ Double-click on the Markdown cell to edit it. Change the font of the text by placing a # symbol at the beginning of the line, and execute the cell<br><br>

11. ❏ Repeat step 10 with ## and ### symbols sequentially and note the difference in font size each time. This is a method of placing headers in your Jupyter Notebooks for presentation purposes<br><br>

12. ❏ Experiment with typing several commands (for example, type a few print commands of text and computations, similar to steps 4 and 6) into one cell and then splitting the cell in order to run the commands separately from each other<br><br>
*Hint: Menu->Edit->Split Cell*<br><br>

13. ❏ Experiment with deleting cells<br><br>
*Hint: Menu->Edit->Delete Cells*<br><br>

14. ❏ View the Python Documentation under the Help menu<br><br>
15. ❏ Click on the menu Help -> User Interface Tour for an overview of the Jupyter Notebook App user interface <br>
 <br>
16. ❏ Go to Notebook Ex1_1 and continue with step 17 for the remainder of the exercise
<br><br>
**Major Step 2: Loading libraries and data into Python**<br>
<br>
17. ❏ Load the *pandas* library module in the cell below, using the syntax: <br>
(Use an alias of pd)<br><br>
> import *module_name* as *alias*<br>

In [None]:
import pandas as pd

18. ❏ Import the statsmodel.api library. Use an alias of sm <br>

In [None]:
import statsmodels.api as sm

19. ❏ Use the **get_rdataset()** function to load the **carData** dataset from the **Duncan** package in R. Store it in a variable called **prestige** <br><br>
*Syntax:  sm.datasets.get_rdataset('Duncan', 'carData').data*

In [None]:
prestige = sm.datasets.get_rdataset('Duncan', 'carData').data

20. View the dataset by typing its name, i.e. prestige<br>

In [None]:
prestige

21. ❏ View the current working directory in Python.

In [None]:
import os
os.getcwd()

22. ❏ Navigate to the C:\1264 directory in **Windows explorer**. View the structure of the following .csv file using Textpad.

C:\1264\weather.csv

When you have finished viewing the file, load it into a pandas dataframe and view it

In [None]:
data = pd.read_csv('weather.csv')

data

**Major Step 3: Loading a web page into Python (webscraping)**

23. ❏ Import the *requests* library, and then import the *BeautifulSoup* library from *bs4*

In [None]:
from bs4 import BeautifulSoup
import requests

24. ❏ Load the URL http://en.wikipedia.org/wiki/List_of_countries_by_population into Python and view it.

In [None]:
r = requests.get("http://en.wikipedia.org/wiki/List_of_countries_by_population ")

25. ❏ Retrieve links from the website.

In [None]:
data = r.text
soup = BeautifulSoup(data, "html.parser")
for link in soup.find_all('a'):
    print(link.get('href'))

## <center>**Congratulations! You have successfully loaded various types of data into Python.**</center>

![image.png](attachment:image.png)

# <center>**This is the end of the exercise.**</center>