# Intro to Human-Centered Data Science
## Setting up GitHub and JupyterHub
If you're reading this, you successfully forked the Git repo for this assignment from GitHub. Congratulations! 
Now that you've done that, you can see the first Jupyter Notebook for the class. To continue the assignment, let's do a few things with the notebook that we'll often see in data science tasks.  
But first, let's take a look at the Notebook itself. One of many great things about Jupyter Notebooks is that we can combine text, programming code, and visualizations in the same file. 

### Markdown
The text you're reading now is in a format called [Markdown](https://www.markdownguide.org). Markdown is a simple language that lets you add formatting to plain text documents. You're used to formatting text by clicking on words or phrases and then selecting a format, e.g., bold, italic, Header, etc. In Markdown, you don't do apply formats this way: Instead, we add special codes to the text to specify the way it should appear.`

For example, the first line in this document is a heading. To make it appear large, we add a number sign before (e.g., `# Intro to Human-Centered Data Science`). You can make _italicized_ by adding an asterisk or underscore before and after it (e.g.,  `_Jupyter Notebooks are great!_`). Want text in **bold**? Add two asterisks before and after the text (e.g., `**my bold text**`).  

A Jupyter Notebook cell can be set to edit and display text with Markdown formats. It's a useful way to add documentation to your project with all the benefits of formatted text. [Here's](https://www.markdownguide.org/cheat-sheet/) a useful cheatsheet that describes all the Markdown syntax. Let's practice using some of the codes you'll use often.    


Using the [cheat sheet](https://www.markdownguide.org/cheat-sheet/) as a guide, write a short piece of text introducing yourself below adding the following formatting in Markdown:
1) Include a *heading* with your name.
2) Tell us where you did your undergraduate degree. Format the university in **bold** and *italicize* your major.
3) Tell is why you decided to studay data science. Highlight your response using as a *blockquote*
3) Name three of your hobbies using a *numbered* list.
4) Provide the name of a web site that you often visit and include a *hyperlink* to that site. 

### ENTER YOUR TEXT WITH MARKDOWN HERE  


### Importing Python libraries
We learned earlier that the Python programming language has many libraries that provide useful tools for free. We use a Python command called `import` to include a library in a Notebook. For example, the statement `import math` will make Python's math library available in our Notebook. The library lives in what is called a _module_. Think of it as a container filled with lots of math functions and constants. We have use what's called _dot notation_ to access the functions inside the module. Look at the code below for an example:


In [1]:
import math
print(math)

<module 'math' from '/opt/tljh/user/lib/python3.10/lib-dynload/math.cpython-310-x86_64-linux-gnu.so'>


We used `import` to bring the math library into our Notebook. When we print math, we see its type identifier. Don't worry too much about this except to note that it is a module. Now let's access the content `pi` within the module.

In [2]:
print(math.pi)

3.141592653589793


There's the dot notation. We write `math.pi` to access the constant value 3.14. What happens if we try to ask for pi _without_ the math module? 

In [3]:
print(pi)

NameError: name 'pi' is not defined

Look at all of that text in the red box: That's Python's way of signaling an error. As you can guess, 'pi' is undefined. But `math.pi` is defined and ours to use once we import math. In fact, we can call lots of math functions using dot notation:

In [4]:
print(f"The factorial of 5 is = {math.factorial(5)}")
print(f"90 degrees in radians = {math.radians(90)}")
print(f"The sine of 90 degrees = {math.sin(math.radians(90))}")
print(f"We can represent non-numbers with {math.nan}")

The factorial of 5 is = 120
90 degrees in radians = 1.5707963267948966
The sine of 90 degrees = 1.0
We can represent non-numbers with nan


Don't worry too much about the actual math. The important thing to note here is we import libraries as modules, and we access things inside of them with `<module_name.function_or_constant_name>`.  
Sometimes we only want to import a few functions or constants from a library. As an example, imagine we only need the constant `pi` and the function `pow` from the math library. We can do the following to just get those two:

In [5]:
from math import pi, pow
print(pi)
print(f"2 to the 5th power = {pow(2,5)}")

3.141592653589793
2 to the 5th power = 32.0


We can also import a library and give it an alias. For example, we will us a library named `pandas` **a lot** in this and other courses. You often see the pandas module given the nickname or alias `pd`. Here's how to do that:

In [6]:
import pandas as pd
print(pd)
my_series = pd.Series([1,3, 5, 7, 9])
print(my_series)

<module 'pandas' from '/opt/tljh/user/lib/python3.10/site-packages/pandas/__init__.py'>
0    1
1    3
2    5
3    7
4    9
dtype: int64


You can see that we import pandas *as* `pd`. Then we can use dot notation on pd to use functions in the pandas library. We printed pd to make sure it's a module. Then we called a function named `Series` to make a one-dimensional array of numbers. 

### Loading data from a file into pandas DataFrames

We'll often have data stored in a file or database that we want to load into a Notebook. pandas has a set of [input/output functions](https://pandas.pydata.org/docs/user_guide/io.html) that let us load and store data from HTML, common-separated value (CSV), Excel, SQL, and other files. Let's try a simple example with a CSV file.   
You should have a file named 'student-scores.csv' in the repository that you forked from GitHub. Let's load that file using the pandas function `read_csv`.

In [8]:
df = pd.read_csv("student-scores.csv")
df.head()

Unnamed: 0,id,first_name,last_name,email,gender,part_time_job,absence_days,extracurricular_activities,weekly_self_study_hours,career_aspiration,math_score,history_score,physics_score,chemistry_score,biology_score,english_score,geography_score
0,1,Paul,Casey,paul.casey.1@gslingacademy.com,male,False,3,False,27,Lawyer,73,81,93,97,63,80,87
1,2,Danielle,Sandoval,danielle.sandoval.2@gslingacademy.com,female,False,2,False,47,Doctor,90,86,96,100,90,88,90
2,3,Tina,Andrews,tina.andrews.3@gslingacademy.com,female,False,9,True,13,Government Officer,81,97,95,96,65,77,94
3,4,Tara,Clark,tara.clark.4@gslingacademy.com,female,False,5,False,3,Artist,71,74,88,80,89,63,86
4,5,Anthony,Campos,anthony.campos.5@gslingacademy.com,male,False,5,False,10,Unknown,84,77,65,65,80,74,76


Here's what we just did:
1) We called `pd.read_csv` with the name of our data file, student-scores.csv. 