# **The DS-100 Guide**


## Table of Contents
- [The Shared Computer Cluster (SCC)](#'scc')
- [Vanilla Python](#'python')
- [Packages](#packages)
    - [NumPy](#numpy)
    - [datascience](#datascience)


## **The Shared Computing Cluster (SCC)** <a id='scc'></a>

The SCC is what all students in DS-100 will be using to run the files required for their assignments. It is a diverse Linux cluster of numerous components including over 9,000 shared CPU cores, 250 GPU cores and 1 petabyte of storage. The SCC is mainly used for tasks that require high-performance computing from disciplines such as engineering, biostatistics, and machine learning. 


### **Accessing the SCC**

1. Ensure that you are not on Boston University’s “BU (802.1x)” wifi network
2. Go to [the SCC site](https://scc-ondemand.bu.edu)
3. Login to the SCC with you kerberos username and password (the account you use for the student link)
4. Optional: Save/bookmark this page as you will be accessing it often



## **Python** <a id='python'></a>


### **Data Types**

#### Int (integer) (Lec #4)

Ints in Python can be an integer of any size. Additionally, it never has a decimal point.

In [93]:
x_int = 1234

#### Float (int w/ decimal point) (Lec #4)

Floats always have a decimal point, but they have an optional fractional part. They also have a limited size and precision of 15-16 decimal places. Floats **can be wrong** in the final few decimal places after arithmetic (due to how they are stored in memory, see __[here](https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate)__).

In [94]:
x_float = 12.34

#### Str (Strings & text) (Lec #4)

#### Arrays/Lists 


### **Assignment Statements** (Lec #3)

Assignment statements, denoted by a *single* =, are a type of statement that binds a variable to a value. Additionally, variable names are *case-sensitive* (i.e. 'A' is not equal to 'a').

In [95]:
hw_due_date = 'On Thursday'
print(type(hw_due_date))

# The value and type can be overwritten

hw_due_date = ['On Thursday']
print(type(hw_due_date))

a = 0
A = 1
print(a == A)

<class 'str'>
<class 'list'>
False



### **Using Functions** (Lec #3)

A function is a block of code that will run whenever it is called. The variable listed inside of the parentheses are paramters, while the actual value that is sent to the function are arguments. Additionally, some functions can take *multiple* arguments (i.e. np.array(), max(), min(), etc).

In [96]:
# From Lec #4
# type() is a function, while x_int and x_float are parameters in this case
# print() is also a function, and it can take more than one argument

print(type(x_int), type(x_float))

<class 'int'> <class 'float'>


It is also possible to daisy-chain functions together. In the example below, lower() makes all the letters into lower case, while replace() replaces *all* instances of 'han' with 'c'.


In [97]:
name = 'Ethan Chang'
weird_name = name.lower().replace('han', 'c')
print(name)
print(weird_name)

Ethan Chang
etc ccg



### **Conversions** (Lec #4)


## **Importing packages** <a id='packages'></a>

In [98]:
import numpy as np
from datascience import *


## **Package: NumPy** <a id='numpy'></a>

### **NumPy Functions**

#### np.array

#### np.arange

#### np.diff


## **Package: datascience** <a id='datascience'></a>

### **Creating a table**


In [99]:
ca_table = Table().with_columns(
    'Label', ['Row 1', 'Row 2', 'Row 3', 'Row 4'],
    'Course Assistants', ['Ethan Chang', '?', '?', '?'],
    'Major', ['Data Science', 'N/A', 'N/A', 'N/A']
)

ca_table

Label,Course Assistants,Major
Row 1,Ethan Chang,Data Science
Row 2,?,
Row 3,?,
Row 4,?,


### **Table Structure** (Lec #3)

If you run the code cell below, you will see a table with 4 rows and 3 columns. The name of a column (also known as a label) will always be at the top, and the entries for the table will be below it. 

In [100]:
ca_table

Label,Course Assistants,Major
Row 1,Ethan Chang,Data Science
Row 2,?,
Row 3,?,
Row 4,?,


### **Table Operations**

#### t.select(label) (Lec #3)

#### t.drop(label) (Lec #3)

#### t.sort(label) (Lec #3)

#### t.where(label, condition) (Lec #3)