# UBC
## Programming in Python for DS

### Week 1
Instructor: Socorro Dominguez-Vidana

## How does this Course Work?

4 Main Components:
- [Course Material](https://prog-learn.mds.ubc.ca/en/)
- Jupypter Hub for assignments
- Piazza for Questions
- Office Hours: 2 hours a week and will be recorded. (Optional Attendance) 

### Asking Questions Etiquette:
- If you have a question regarding the assignment, **do not** email your instructor.
    - The idea is that the knowledge is given to all the class and not to single students.
- Instead, go to Piazza and see if there is a discussion regarding that question already.
- If not, ask the question on Piazza, you can ask anonymously.

- e-mail your instructor **only** if it is regarding your grades or a personal matter that needs to be discussed. e-mails regarding assignment questions will not always be responded to or will not be responded to in a timely manner.

### Piazza?

- Piazza is similar to a website called StackOverflow.

- Its main purpose is to communicate questions among students and instructors.
    - Collaboration in Data Science is extremely important.

- Try working alone first on a problem. If you have worked on it for 15-30 minutes and you cannot figure it out, **ASK** on Piazza.
  
- You might want to ask a question on Piazza but not know how.

    - The more details you provide, the more likely the instructors and your peers will be able to help you appropriately.

    - **avoid** screenshots or uploading pictures. Instructors/peers may not always be allowed to open 'external' files and so, your question might be skipped.  

    - Instead, use ``` to start a code chunk, then copy and paste the code that you are using and the error message that is coming out.

#### Example

> **Title: Assignment 1 Question 3.a**  
> Hi all, I am having troubles in Assignment 1 Question 3.a  
> This is the code that I am trying:
>
> ```python
> x + y
>```
> <br>
>
> I am receiving the error:
>```output
>KeyError: x is not defined
>```
> <br>
>
> I do not understand what the error means, could someone help?

This way we can all help.

### Office Hours Expectations

- Students **must** bring in questions:
    - On Mondays we will do a Module preview.
    - Wednesdays, Office Hours rely entirely, on the questions brought by you. 
<br> 

- Try to keep your mic muted but your camera on (at least when you are speaking). Stay engaged.

- Although attendance is optional, it is highly encouraged that you do at least watch the recordings, there might be announcements made during Office Hours.

- To ask a question in Office Hours, be ready to share your screen and walk us through where you are stalled. If you have worked through the solution partially, also give us a tour :) Do not expect to just be told how to solve the exercise - it is needed that you share your screen.

- If there are no questions on the student's side, office hours can be dismissed early. Come prepared.

### Assignment Expectations
- You will have one assignment per week. 
    - The first submission is on Sunday (the first assignment is more of a walkthrough).  
    <br> 

- **Do not** change the names of the files. It might be tempting to add your name. Please do not as this may cause problems when grading you.

- Avoid uploading data or unrelated course material into the server. This might also cause crashes with the server.

- Try to submit on time. We will talk more about late submissions next week. But try to stay on track as much as possible. You will all receive an email next week regarding late submissions. If you submitted your first assignment on time, don't worry about receiving the email; it is just for your reference.

- To submit, all assignments (except for the Final Project) need to be submitted on the server only. To achieve this, simply save your file changes - as simple as that.

### Hints on how to Navigate the Course?

- Read the Goals of the Module.

- Read the Assignment (so that you have an idea of what you are expected to do)

- Watch the videos and solve the exercices.

- Instead of waiting to solve the whole assignment on the weekend, code along and try solving a bit every day.

- Try working on a problem for 15-30 minutes before asking on Piazza.

- If Piazza is not enough, bring your question to Office Hours.
    - Maybe you got the right answer but don't understand how/why it works. Bring the problem and we will discuss it.

### What is Python?

- Python is a widely used general-purpose, high-level programming language.

- Designed by Guido van Rossum in 1991 who developed by Python Software Foundation.

- Developed to allow programmers express concepts in fewer lines of code.

- Object-oriented programming language (can model real-world entities). 

- Dynamically-typed and already interpreted - we don't need to compile it.

- Python 3 was released in December 2008.

#### Python's Fun Facts

- Firstly introduced at the National Research Institute for Mathematics and Computer Science, Netherlands, 1991. [source](https://www.journaldev.com/34415/history-of-python-programming-language)

- Named after the comedy show Monty Python's Flying Circus (it's in Netflix)

- Python has become the most popular coding language in the world. 
    - This makes a career in Python a great choice. Not just for Data Science/Analytics.

### Why Python for Data Science ?

- Fast programming language to pick up - from a syntax point of view.
    - We will use python as a functional language rather than an OOP language.

- Active community with a vast selection of libraries (such as pandas and Altair)  and resources.

- Professionals working with Data Science applications want to focus on insights rather than on complications of language.

### What is Jupyter?

- It is an IDE (integrated development environment)

- We can use Python via Jupyter.

- You can think of Python like a car's engine, while Jupyter is like a car's dashboard.
    - Python is the programming language that runs computations
    - Jupyter is the IDE that provides an interface by adding convenient features and tools.
    
- We can use other programs with Jupyter (R, Julia, Matlab,...)

### Why Jupyter?

- In Jupyter we can code, do plots, format text, equations, etc. in a single document.

- Allows us to run Python code interactively.

- Notebooks are great for exploration and for documenting a complete workflow.

- Notebooks can be shared in a human readable format:
    - Share online with nbviewer.jupyter.org
    - Github, any notebooks you upload are automatically rendered on the site.
    - Convert to HTML, PDF, etc.

### *Course Requirements?*

For this course, you do not need to install anything.  

The Jupyter server that loads when you start an assignment suffices for this course.

If you want to install it in your computer, follow these instructions:
[MDS Installation Guide](https://ubc-mds.github.io/resources_pages/installation_instructions/)

### Characteristics of Notebooks

- A notebook consists of a series of "cells":
    - Code cells: execute snippets of code and display the output
    - Markdown cells: formatted text, equations, images, and more

In [1]:
# Code Cell

x = 3
x + 6

9

In [2]:
import pandas as pd

```sql
# Markdown Cell
SELECT * from table
```

```python
print("hello")
```

$\sum x + y = 10$

Note: By default, a new cell is always a code cell.

### Python Data Science Ecosystem

- Python has many uses: 
    - Web development
    - Automation or scripting
    - Software testing and prototyping
    - Everyday tasks
    - Data Analysis & Data Science
    
- Python has built-in functions. But that is not enough for us (we don't want to reinvent all functions).

- The Python libraries for data science are developed and maintained by external "3rd party" development teams

- Python core + 3rd party libraries = **ecosystem**
    - To install and manage 3rd party libraries, you need to use a package manager such as conda (which comes with Anaconda/Miniconda) - More on this in the DS Toolbox
    

During the program, we will be working with Pandas, numPy, and Altair -  
**You do not need to install anything.** You will all be working from the server.

## Tricks with Notebooks

In [5]:
# This is a code cell

x = 5
3+x # Shows output

8

In [6]:
y = 3

### Writing a formula 
- Render with latex using `$`

Write 
>
>```markdown
$x + y = 8$
>```

The output is: 

>$x + y = 8$

- Loading an image:

```markdown
![](image_path)
```

Writing chunks of code as markdown (that doesn't execute) - type:
```markdown
    ```python
        print("hello world!")
    ```
```


Renders

```python
print("hello world!")
```

Write variables from the document between `

`x`

In [7]:
z + y

6

In [8]:
z = 5

## How can I follow on some of the code of the videos?

The cells are not working, or I want to follow what is happening in the videos, how can I do that?

Go to [https://prog-learn.mds.ubc.ca/en/](https://prog-learn.mds.ubc.ca/en/) and scroll to the bottom of the page until you see Source; click on [Source](https://github.com/UBC-MDS/programming-in-python-for-data-science)

There is a folder called **data**; you can access all the datasets for the course. 

Open any of the datasets and then click on **raw**; you can use pd.read_csv(**insert_url**) to work ; for example:

In [9]:
import pandas as pd

pd.read_csv('https://raw.githubusercontent.com/UBC-MDS/programming-in-python-for-data-science/master/data/canucks.csv')


Unnamed: 0,Player,No.,Age,Height,Weight,Country,Position,Experience,Birth Date,Salary
0,Justin Bailey,38,24,193,214,United States,Forward,4,01-Jul-95,700000.0
1,Jay Beagle,83,34,191,210,Canada,Forward,11,16-Oct-85,3200000.0
2,Jordie Benn,4,32,188,199,Canada,Defense,8,26-Jul-87,2400000.0
3,Guillaume Brisebois,56,22,188,175,Canada,Defense,1,21-Jul-97,700000.0
4,Thatcher Demko,35,24,193,192,United States,Goalie,2,08-Dec-95,900000.0
5,Alexander Edler,23,33,191,212,Swedan,Defense,13,21-Apr-86,7000000.0
6,Loui Eriksson,21,34,188,179,Swedan,Forward,13,17-Jul-85,5000000.0
7,Adam Gaudette,88,23,185,184,United States,Forward,2,03-Oct-96,925000.0
8,Bo Horvat (C),53,24,183,215,Canada,Forward,5,05-Apr-95,5775000.0
9,Quinn Hughes,43,20,178,175,United States,Defense,1,14-Oct-99,
