# DS109 Python : Lesson Eight Python for Data Science

### Table of Contents <a class="anchor" id="DS109L8_toc"></a>

* [Table of Contents](#DS109L8_toc)
    * [Page 1 - Python for Data Science](#DS109L8_page_1)
    * [Page 2 - Get Familiar with Jupyter Notebook!](#DS109L8_page_2)
    * [Page 3 - Data Science-Specific Packages](#DS109L8_page_5)
    * [Page 4 - Importing Data for Windows](#DS109L8_page_6)
    * [Page 5 - Importing Data for Mac / Linux](#DS109L8_page_7)
    * [Page 6 - Viewing Files in Pandas](#DS109L8_page_8)
    * [Page 7 - Descriptive Statistics](#DS109L8_page_9)
    * [Page 8 - Keyboard Shortcuts for Jupyter Notebook](#DS109L8_page_10)
    * [Page 9 - Key Terms](#DS109L8_page_11)
    * [Page 10 - Lesson 8 Hands-On](#DS109L8_page_12)
    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Python for Data Science<a class="anchor" id="DS109L8_page_1"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [1]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Python for Data Science
VimeoVideo('388856304', width=720, height=480)

The transcript for the above overview video **[is located here](https://repo.exeterlms.com/documents/V2/DataScience/Video-Transcripts/DSO109L08overview.zip)**.

You have previously had a wonderful introduction to object-oriented programming in Python.  It is very important that you gained a good understanding of the basics that are not data science specific for several reasons: 

* For more complex work in machine learning or big data, you may need the ability to utilize a text editor like Visual Studio Code and the command line.
* In data wrangling, you may find yourself using for loops, dictionaries, and other basics.
* Hiring managers may not distinguish between the use of Python for software engineering and Python for data science, which means that you may take skills assessments that will depend on your ability to use Python as an object-oriented language in order to land a job.

---

## How is Python Used for Data Science?

Conceptually, using Python for Data Science is very similar to how you use R.  You'll learn the basic packages (think the equivalent of libraries in R) needed to import and manipulate data, perform data wrangling, conduct statistics, perform machine learning, and so much more! Python is nice for data science because it is so very flexible.  

---

## Comparing Python to R as a Data Science Tool

R was built for statistics and statistics only.  With that in mind, it provides several advantages:

* The output from R typically provides more information about the analysis you ran and it is easier to get ahold of this output.
* You are allowed to have missing data in many R data analyses, as it will use pairwise deletion. This usually gives you greater power and a little more flexibility with your data.
* It sometimes requires less code  or fewer steps to run analyses or data visualizations.
* R has a wider range of traditional statistical methods and visuals.

The advent of Python for data science really stemmed from the programming world instead of the statistical world, which means that using Python has its own set of advantages! 

* Python is typically better equipped for machine learning.
* Python connects better to big data programs such as Spark and Hadoop.
* It is easier to integrate basic programming elements into Python.
* There is more documentation for Python and more people are currently using it, which means that it will be easier to get help solving problems.

Because both Python and R have different strengths, you are encouraged to use both and really get familiar with what each program is capable of.  Then, as you become comfortable, you will be able to choose the program that will best suit your data scientist needs of the day.  Some things are just easier in Python, or easier in R.  It would be silly to struggle through something in Python that is easily completed in R, or vice versa. Think of both programs as complementary, rather than redundant.

---


<hr style="height:10px;border-width:0;color:black;background-color:red">

## Please disregard if you have already utilized Chocolatey or Homebrew scripts to install Jupyter!

If you have Windows and have not yet installed all data science applications please use our handy [Chocolatey Script](../DS101-Basic-Statistics/Installation/Chocolatey.ipynb) and follow the directions for Jupyter Lab explicitly!

If you have Mac and have not yet installed all data science applications please use our handy [Homebrew Script](../DS101-Basic-Statistics/Installation/Homebrew.ipynb) and follow the directions for Jupyter Lab explicitly!

<hr style="height:10px;border-width:0;color:black;background-color:red">

# Remove all references to Anaconda



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Get Familiar with Jupyter Notebook!<a class="anchor" id="DS109L8_page_2"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Get Familiar with Jupyter Notebook!

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>You may want to watch this <a href="https://vimeo.com/427596949"><b> recorded live workshop </b></a> that goes over the basics of using Jupyter Notebook, which will be covered in the next few pages of your lesson. </p>
    </div>
</div>

Here is an up close and personal view of the Jupyter Notebook menu.

![Jupyter window with menu options file, edit, view, insert, cell, kernel, widgets, and help. There is a search box.](Media/Anaconda13.png)

---

## Renaming Your File

You'll notice that the first thing you have on the top is a label saying this is ```Untitled```.   If you click there, it will bring up an option to rename the notebook to something you'll actually remember.  Something with the module name and lesson number in it is recommended so that you can easily refer to your work later as you move forward with the course.

![Window titled rename notebook. Enter a new notebook name. A textbox is shown where the new name can be typed in. There is a rename button in the lower right corner.](Media/Anaconda14.png)

Pressing the big blue ```Rename``` button will exit you out of the window once you've typed in something. 

---

## The File Menu

In this File menu, there are some noteworthy features:  

* You can create a new notebook.
* You can copy your notebook here.
* You can save, though this is also done with the floppy disk icon.
* If you screw up really, really badly, you can choose the option to ```Revert to Checkpoint``` for a particular day and time.  

    <div class="panel panel-info">
        <div class="panel-heading">
            <h3 class="panel-title">Tip!</h3>
        </div>
        <div class="panel-body">
            <p> Typically this option isn't necessary, because you can just re-run cells, but it's there in case of catastrophic failure!</p>
        </div>
    </div>

---

### Creating a New Notebook

If you want a new notebook in the same folder as your current one, you can use the ```File``` menu and choose the ```New Notebook``` option.  

![Jupyter menu with the pull down file menu shown. It has the options new notebook, open, make a copy, rename, save and checkpoint, revert to checkpoint, print preview, download as, and close and halt.](Media/Anaconda15.png)

<div class="panel panel-info">
    <div class="panel-heading">
        <h3 class="panel-title">Tip!</h3>
    </div>
    <div class="panel-body">
        <p>If you want a new notebook in a different folder, you will need to go back to your Jupyter directory in the previous tab, navigate to that folder, and then click "New" as you did on the installation instructions page.</p>
    </div>
</div>

---

## The Edit Menu

In the Edit menu, you'll see these options to manipulate the cells: 

![Jupyter menu with the pull down edit menu shown. Options are cut cells, copy cells, paste cells above, paste cells below, paste cells and replicate, delete cells, undo delete cells, split cell, merge cells above, merge cells below, move cell up, move cell down, edit notebook metadata, find and replace, cut cell attachments, copy cell attachments, and paste cell attachments.](Media/Anaconda16.png)

Feel free to explore as you see fit, but typically the plus button to add cells in, and the scissors symbol to cut out cells is all most folks end up needing.  By default, the plus sign will add a cell after. 

![Menu bar with icons for save, cut, copy paste, up, down, run, stop, revert, fast forward, markdown, and keyboard.](Media/Anaconda34.png)

---

## The Cell Menu

Next you have the ```Cell``` menu.  While you have this open, anything you do sequentially down the page will stay put.  But, if you close this Notebook, you will need to re-run things to keep working.  While this can be done cell by cell by clicking into each one and using ```shift + enter```, you can also use the ```Cell``` menu to run in any variety.  Typically ```Run All``` is the most useful, but depending on your need you can run selected cells or above and below a certain point. 

![Jupyter menu with the pull down cell menu shown. Options are run cells, run cells and select below, run cells and insert below, run all, run all above, run all below, cell type, content output, all output.](Media/Anaconda17.png)

---

### Using Markdown

The other thing to point out on the Cell Menu is the Cell Type. The default is code, so in your case, Python.  But there's also an option for markdown.  *Markdown* is an easy way to translate text into an HTML-like format that reads pretty on websites.  So, unlike the usual way to make notes to yourself in your project, which would just be a hashtag (#), you can change the code to markdown and it will render once you ```shift + enter``` as formatted text.  

Here are the basics of markdown: 

![Window with text that reads Hello students! This is what markdown looks like. There are generic heading 2, 3, and 4 at the bottom of the screen.](Media/Anaconda18.png)

And this is what it will look like after you ```shift + enter```: 

![Window with text that reads Hello students! This is what markdown looks like. There are generic heading 2, 3, and 4 at the bottom of the screen. The text is cleaned up and appears like normal print instead of code.](Media/Anaconda19.png)

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Want to learn more markdown? </h3>
    </div>
    <div class="panel-body">
        <p>Check out <a href="https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet"> this reference! </a></p>
    </div>
</div>

---

## Kernel Menu

Another useful menu you have along the top is the Kernel Menu.  There may come a time when you are trying to work in Jupyter and it will throw up an error message in the top right hand corner that it has disconnected.  If that happens, don't panic - just choose the ```Reconnect``` option in the Kernel menu, and/or the ```Restart & Run All``` option.  That should get you back to work quickly without too much fuss.

![Jupyter menu with the pull down kernel menu shown. Options are interrupt, restart, restart and clear output, restart and run all, reconnect, shutdown, and change kernel.](Media/Anaconda20.png)

---

## Help Menu

The most notable feature in the ```Help``` menu is the ```Keyboard Shortcuts``` option. Click on this, and you'll find a list of all the keyboard shortcuts Jupyter Notebook supports: 

![Jupyter screen of the help menu. There are several keyboard shortcuts listed.](Media/Anaconda36.png)

You'll notice here that there are two modes Jupyter can utilize: ```Command mode``` and ```Edit mode```. ```Edit mode``` is the default. In it, you've already learned the shortcut ```Shift + enter``` to run cells.  A couple other handy shortcuts in ```Edit mode``` are: 

* **Tab**: Provides code completion.
* **Shift + Tab**: Pulls up a tooltip that gives you information about the code and all the arguments available for it.
* **Ctrl + z**: Undo an action

There is also a ```Command mode```, which you can activate by pressing ```Esc```. These shortcuts are primarily for moving around in your notebook, but if you find you use Markdown a lot in your notebooks (and you should - it nicely keeps everything organized!) then these are helpful: 

* **m**: Change cell to markdown
* **1**: Change cell to the first markdown heading
* **2**: Change cell to the second markdown heading
* **3**: Change cell to the third markdown heading

---

In [1]:
try:
    from DS_Students import MultipleChoice  
except:
    !pip install DS_Students
    from DS_Students import MultipleChoice
from ipynb.fs.full.DS109Questions import *

In [None]:
try:
    display(L8P4Q1, L8P4Q2, L8P4Q3, L8P4Q4, L8P4Q5)
except:
    pass

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Data Science-Specific Packages<a class="anchor" id="DS109L8_page_5"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Data Science-Specific Packages

Just like libraries in R, there are a plethora of different packages to use in Python! Here are some of the more common ones you will encounter, and their typical abbreviations: 

* **Pandas (pd):** Yes, you read that right - pandas! The pandas package allows you to import data and manipulate data frames easily.
* **Numpy (np):** Numpy (pronounced numb-pie) is a package that allows you to work with arrays.
* **Scipy (scp):** Scipy will have a lot of statistics in it.
* **statsmodels (sm):** If the stat you're looking for isn't covered in scipy, try stats models!
* **scikitlearn (sklearn):** Machine learning package
* **matplotlib (plt):** Matplotlib has a lot of graphing capabilities and other visualizations.
* **seaborn (sns):** Another good data visualization package.  We will also use it often because it has some great built in datasets.

You will absolutely encounter other packages, but those are a few of the more common ones that you will see over and over again.

---
## Installing Data Science Packages

If you're using Jupyter, most of the common packages will already be ready for you to go. You don't need to install them, just import them into your Notebook.  That is one of the best features about Jupyter!! However, at some point, you may start using some of the more unusual packages, and there will come a time when you will need to install a package. If that ever happens, simply run a cell in your Jupyter Notebook that starts with `! pip install` and then place the package name you need after. So if you needed "quandl", the line of code would look like this:

```python
! pip install quandl
```




Here you'll search for ```quandl```, which is a package that lets you work with financial data. Click on the little checkbox to select it, and then click the green ```Apply``` button on the bottom.  You may need to click ```Apply``` again if there are other dependent packages needed to make yours work properly. 



And you should get a message that the package is available.  This may take some time, depending on the size of the package. 

---

## Installing Packages for Python with a Text Editor

If you're wedded to your text editor, then you can still use it for data science tasks - but many things become a little less user friendly. Installing packages is one of them. It can be done, but you'll need to be in your terminal. You'll want to type out ```pip install``` and then the package name.  An example would look like this:

```bash
pip install quandl
```

Again, for the majority of the course, demonstrations will be done in Jupyter, but you're free to explore what works best for you.  

---

In [2]:
try:
    display(L8P5Q1, L8P5Q2)
except:
    pass

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '1. You will need to install all packages before usi…

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '2. If you need to install an uncommon package for J…

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Importing Data for Windows<a class="anchor" id="DS109L8_page_6"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Importing Data for Windows

These directions are for importing data on a Windows computer.  If you are working on a Mac or Linux computer, please move to the next page. 

---

## Importing a Package

First, you will import your very first Python package in Jupyter.  ```pandas``` is one of the best data science specific packages, and so you'll start there.  In your empty line, type: 

```python
import pandas as pd
```

This is telling Python that you want the ```pandas``` package, and in your every day use, you will be abbreviating ```pandas``` as ```pd``` from now on in this notebook.  After you've typed this in, press the enter and shift keys at the same time to run this cell, or use your mouse to click on the play button at the very top of the menu that conveniently says ```Run```. As soon as you do this, you will notice that the empty brackets next to the cell turns to a star.  That means Jupyter is processing the command.  Once it has finished processing, that star will turn to a line number, and a new empty line will appear.

![Python import text box with the text import pandas as pd in it.](Media/Anaconda21.png)

Now that you've imported the ```pandas``` package, you want to bring in some data to work with! This is something that ```pandas``` excels at.  

---

## Importing Data

See if you can bring in **[this dataset](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/bikes.zip)** about bike sharing.  Here's some example code from a Windows machine: 

```python
bikes = pd.read_csv('../Data/bikes.csv')
```

The function ```pd.read_csv()``` comes from the ```pandas``` package (which you have abbreviated ```pd```), and is meant to read in data from a ```.csv``` format.  In the parentheses as an argument, you'll put the directory pathway to where you've stored the file. Don't copy and paste this exactly! You will need to change the directory route to find your own file.  You can find out this directory by navigating to your file in the file explorer and then clicking in the file menu at the top, which will highlight where you're at. Then you can copy this pathway into your Jupyter Notebook. 

However, you will need to change which way the slashes are facing.  In Jupyter, the slashes are slanting down to the left ( / ).  This is called a backslash. In your file directory system, the slashes are slanting down to the right ( \ ).  This is called a forward slash.  You will get an error if you try and use the forward slash instead of the backslash.  Lastly, don't forget to add the name of the actual file to end, as well as the file extension! In this case, it is ```bikes.csv```. 

The code you see above is for ```.csv``` files, which is the predominant file type you'll use in this course.  However, if you wanted to bring in an MS Excel file, then instead of using the ```.read_csv``` command for the file extension ```.csv```, you could use the ```.read_excel``` command for the file extension ```.xlsx```.  Either work marvelously - it just depends on what format your data is in to begin with!

<div class="panel panel-info">
    <div class="panel-heading">
        <h3 class="panel-title">Tip!</h3>
    </div>
    <div class="panel-body">
        <p>You can skip the file pathway altogether and instead just put in the name of the file in the parentheses and quotes IF your data and your notebook are located in the same file folder.</p>
    </div>
</div>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Importing Data for Mac / Linux<a class="anchor" id="DS109L8_page_7"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Importing Data for Mac / Linux

These directions are for importing data on a Mac or Linux computer.  If you are working on a Windows computer, please move to the next section. 

---

## Importing a Package

First, you will import your very first Python package in Jupyter.  ```pandas``` is one of the best data science specific packages, and so you'll start there.  In your empty line, type: 

```python
import pandas as pd
```

This is telling Python that you want the ```pandas``` package, and in your every day use, you will be abbreviating ```pandas``` as ```pd``` from now on.  After  you've typed this in, press the enter and shift keys at the same time to run this cell, or use your mouse to click on the play button at the very top of the menu that conveniently says ```Run```. As soon as you do this, you will notice that the empty brackets next to the cell turns to a star.  That means Jupyter is processing the command.  Once it has finished processing, that star will turn to a line number, and a new empty line will appear.

![Python import text box with the text import pandas as pd in it.](Media/Anaconda21.png)

Now that you've imported the ```pandas``` package, you want to bring in some data to work with! This is something that ```pandas``` excels at.  

---
## Importing Data

See if you can bring in **[this dataset](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/bikes.zip)** about bike sharing.  Here's some example code:

```python
bikes = pd.read_csv('../Data/bikes.csv')
```

The function ```pd.read_csv()``` comes from the ```pandas``` package (which you have abbreviated ```pd```), and is meant to read in data from a ```.csv``` format. You will need to change the directory route to find your own file.  You can find out this directory by navigating to your file in the file explorer and then pressing ```pwd```.  Then you can copy this pathway into your Jupyter Notebook.

<div class="panel panel-info">
    <div class="panel-heading">
        <h3 class="panel-title">But I don't have a pwd key!</h3>
    </div>
    <div class="panel-body">
        <p>Select the file you want to use and then press Command + i to bring up the "Get Info" section.  Then copy the "Where" section using Command + c. Then you can paste that info into your Jupyter Notebook.</p>
    </div>
</div>

However, you will need to change what way the slashes are facing. In Jupyter, the slashes are slanting down to the left ( / ).  This is called a backslash. In your file directory system, the slashes are slanting down to the right ( \ ).  This is called a forward slash.  You will get an error if you try and use the forward slash instead of the backslash. Don't forget to add the name of the actual file to end, as well as the file extension! In this case, it is ```bikes.csv```. 

The code you see above is for ```.csv``` files, which is the predominant file type you'll use in this course.  However, if you wanted to bring in an MS Excel file, then instead of using the ```.read_csv``` command for the file extension ```.csv```, you could use the ```.read_excel``` command for the file extension ```.xlsx```.  Either work marvelously - it just depends on what format your data is in to begin with!

<div class="panel panel-info">
    <div class="panel-heading">
        <h3 class="panel-title">Tip!</h3>
    </div>
    <div class="panel-body">
        <p>You can skip the file pathway altogether and instead just put in the name of the file in the parentheses and quotes IF your data and your notebook are located in the same file folder.</p>
    </div>
</div>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - Viewing Files in Pandas<a class="anchor" id="DS109L8_page_8"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Viewing Files in Pandas

You will learn some about some of the basic functions that ```pandas```contains, such as how to view your data and how to compute basic statistics. 

To view your data file in Jupyter, all you need to do is type in the name and it will appear! However, if you have a very large dataset, this can take a while and isn't best practice. So, ```pandas``` has some shortcuts for you. You can use ```.head()``` or ```.tail()``` to see the top and bottom of your dataset, respectively.  By default, if you don't enter anything into the parentheses, you will see the first 5 rows, but you can enter any number you want in those parens to see more or less data.

Here's what those look like: 

![Table showing the results of an import of pandas as pd.](Media/Anaconda25.png)

```.head()``` is for viewing the beginning of your data, and ```.tail()``` is for viewing the end of your data.

---

## See ALL the Data

If you viewed all your data, without specifying head or tail, you may have noticed that the row numbers jump from 29 to 701.  This is to conserve space, but if you want to see everything, there's a magic line of code to fix it: 

```python
pd.options.display.max_rows = None
```

For this dataset, there's only a few columns, and so most of you will probably see them all, depending on your monitor size, but if you couldn't and needed to, there's very similar code to ensure you can scroll through to see all the columns as well:  

```python
pd.options.display.max_columns = None
```

---

## List out the Columns

You can also see how many columns you have with the command ```.columns``` like this: 

```python
bikes.columns
```

And here is the output you should receive: 

```text
Index(['instant', 'dteday', 'season', 'yr', 'mnth', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt'],
      dtype='object')
```

---

## Get Row and Column Counts

You can also get counts of both your columns and your rows easily, with the function ```len()```.  If you just put in the name of the dataset, you will get the number of rows, like this: 

```python
len(bikes)
```

There are 731 rows of data, so it spits ```731``` back as your output.

If you put in the name of the dataset and then specify ```.columns```, you will get the number of columns instead: 

```python
len(bikes.columns)
```

And since this dataset has 16 columns, that is the output you will receive: ```16```.

---

In [3]:
try:
    display(L8P8Q1, L8P8Q2, L8P8Q3, L8P8Q4)
except:
    pass

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '1. This is the appropriate slash for specifying a d…

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '2. What does the function .head(10) do?\n', 'output…

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '3. How would you display the names of the columns f…

VBox(children=(Output(outputs=({'name': 'stdout', 'text': '4. You are working with a dataset and see this dott…

![Anaconda35.PNG](attachment:aeb4d694-1fb5-493d-bae8-8c67f69d5a96.PNG)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Descriptive Statistics<a class="anchor" id="DS109L8_page_9"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Descriptive Statistics 

Now that you know the basics about viewing your data, you will get familiar with how to run descriptive statistics in Python!

---

## Frequencies

You can easily get counts (frequencies) of the values for a variable.  For instance, to get counts for the values in the month variable, you will specify the dataset (```bikes```), separate with a period, then the name of the variable (```mnth```), separated by a period, and the function ```value_counts()```:

```python
bikes.mnth.value_counts()
```

And here is the output you receive: 

```text
12    62
10    62
8     62
7     62
5     62
3     62
1     62
11    60
9     60
6     60
4     60
2     57
Name: mnth, dtype: int64
```

As you can see, it lists out the month on the left hand side - i.e. the 12th month is December, and then on the right hand side gives you the count of how many there are in the dataset.  At the bottom, it also tells you the type of data - in this case ```int64```, which means that this variable is a number.

---

## Numeric Descriptive Statistics

You can also get descriptive statistics easily on anything that is numeric, by using the ```.describe``` function on the dataset and variable name:

```python
bikes.mnth.describe()
```

And here is the output you receive, which includes the *n* listed as count, your mean, standard deviation as ```std```, all of your quartiles, and the maximum value. It also provides at the bottom the name of the variable and the data type, which is float and is 64 in length.

```text
count    731.000000
mean       6.519836
std        3.451913
min        1.000000
25%        4.000000
50%        7.000000
75%       10.000000
max       12.000000
Name: mnth, dtype: float64
```

---

## Mean and Median

You can also get values for all of your variables at once if you like. Try just running ```.mean()``` or ```.median()``` on your dataset at large: 

![Two tables showing the mean and median values of a list of numbers from two different datasets.](Media/Anaconda30.png)

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Fun Fact!</h3>
    </div>
    <div class="panel-body">
        <p>If you really like using Jupyter Notebook, it also supports R! Check out <a href="https://www.datacamp.com/community/blog/jupyter-notebook-r"> this website </a> to get started! </p>
    </div>
</div>

---

## Summary

In this lesson, you learned about some of the major differences between R and Python, and about data science specific Python usage versus typical programming usage.   You then were guided through the process of installing Anaconda Python and the IDE Jupyter Notebook, learned how to install and import packages, and started working with the infamous ```pandas``` package for data science specific tasks.  

You should now be able to: 

* Install Python packages
* Import Python packages
* Bring in data sets and view them in ```pandas```
* Perform descriptive statistics easily in ```pandas```

---

In [None]:
try: 
    display(L8P9Q1, L8P9Q2)
except:
    pass

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - Keyboard Shortcuts for Jupyter Notebook<a class="anchor" id="DS109L8_page_10"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Keyboard Shortcuts for Jupyter Notebook

You'll most likely decide you like Jupyter Notebook best for working in Python, and if you do, there are a couple good keyboard shortcuts that will really help increase your productivity! 

<table class="table table-striped">
    <tr>
        <th>Function</th>
        <th>Windows Shortcut</th>
        <th>Mac Shortcut</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Run cell</td>
        <td>shift + enter</td>
        <td>shift + enter</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Toggle to markdown mode</td>
        <td>escape m</td>
        <td>escape m</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Highlight multiple cells</td>
        <td>shift + j</td>
        <td>shift + j</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Undo</td>
        <td>ctrl + z</td>
        <td>cmd + z</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Copy</td>
        <td>ctrl + c</td>
        <td>cmd + c</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Paste</td>
        <td>ctrl + v</td>
        <td>cmd + v</td>
    </tr>
</table>

And, if you have already typed in code and realized it needs quotes, parentheses, brackets, etc. around it, you can highlight the entire section of code and then type the opening symbol, and the closing symbol will automatically populate around it.

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>Check out this <a href="https://vimeo.com/522038528"> live recorded workshop going over Jupyter Notebook shortcuts! </a></p>
    </div>
</div>


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 11 - Key Terms<a class="anchor" id="DS109L8_page_11"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Key Terms 

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Integrated Development Environment (IDE)</td>
        <td>This is a type of shell for a program to improve user experience.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Python</td>
        <td>A version of Python specifically created for data scientists.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Jupyter Notebook</td>
        <td>An IDE for Python you'll be using throughout the course to display the product of your statistical coding in real time.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Package</td>
        <td>Like a library in R, a package is a pre-made set of functions made to perform specific tasks in Python.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>pandas</td>
        <td>A Python package meant to help data scientists quickly and easily manipulate their data frames.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Markdown</td>
        <td>A language used to easily and prettily display things on the web.  Used as a comment option in Jupyter Notebook.</td>
    </tr>
</table>

---

# Key Pandas Functions

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>pd.read_csv</td>
        <td>Reads in a CSV data file.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>pd.read_excel</td>
        <td>Reads in a MS Excel data file.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.head()</td>
        <td>Shows the first five rows of your data frame by default; a number placed in the bracket will return the specified number of top rows.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.tail()</td>
        <td>Shows the last five rows of your data frame by default; a number placed in the bracket will return the specified number of bottom rows.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>pd.options.display.max_rows = None</td>
        <td>Shows all the rows of your data frame.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>pd.options.display.max_columns = None</td>
        <td>Shows all the rows of your data frame.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.columns</td>
        <td>Provides the names of all the columns in your data frame.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>len()</td>
        <td>Shows the number of rows of the data frame specified in the parentheses.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>len( .columns)</td>
        <td>Shows the number of columns of the data frame specified in the parentheses.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.value_counts()</td>
        <td>Shows the frequency counts of a particular column in a dataset.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.describe()</td>
        <td>Provides the n, mean, standard deviation, minimum value, quartiles, and maximum value for a particular column in a dataset.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.mean()</td>
        <td>Returns the average for all numeric columns in a dataset.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>.median()</td>
        <td>Returns the median for all numeric columns in a dataset..</td>
    </tr>
</table>

---

# Markdown Symbols

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap># </td>
        <td>First level heading.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>##</td>
        <td>Second level heading.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>###</td>
        <td>Third level heading.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>* </td>
        <td>Bullet point.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>---</td>
        <td>Horizontal dividing line.</td>
    </tr>
</table>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 12 - Lesson 8 Hands-On<a class="anchor" id="DS109L8_page_12"></a>

[Back to Top](#DS109L8_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Jupyter Notebook and Pandas Hands-On

This Hands-On will be graded.  The best way to become a data scientist is to practice!

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Do not submit your project until you have completed all requirements, as you will not be able to resubmit.</p>
    </div>
</div>

You are working for an ecology company, and they have been tracking bison throughout North America. They've  collected **[data on the location, number, genus, and species of bison](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/BisonTracking.zip)**. They'd like to know some basic information about the bison, to determine whether the species is still in danger or whether it is recovering.  

Please perform the following tasks: 

* Read in your data as a CSV file
* Look at the first seven rows of your data
* Look at the last ten rows of your data
* Determine the number of rows and columns your dataset has 


And answer the following questions: 

* How many bison are of the species antiquus? 
* What is the mean and standard deviation of Length? 
* What is the median length of the bison?  

Please annotate your code with markdown to explain each step, then attach your ipynb or an HTML copy of your notebook here, so your work can be graded. 

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Be sure to zip and submit your entire directory when finished!</p>
    </div>
</div>