# Overview
Data science has become an essential skill needed to support customers who are using data to drive equipment performance.
- Fault Detection Classification (FDC)
- Statistical Process Control (SPC)
- First Time Right (FTR)

This course will provide necessary skills to generate insight from most data files.

# Platform Introduction
This class will be utilizing Python.
Python is an interpreted, high-level, general-purpose programming language capable of statistical computing and graphics, including:
- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hardcopy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

## Cells
Notebooks are comprised of units called cells. This allows for a certain block of text to be defined as code or plain text.  
The main thing to know for this class is how to run code cells.
You can run a code cell by having it selected (as denoted by a colored bar to the left of the cell).
Code cells are denoted with a "In[ ]:" tag to the left of the cell.
Run the selected cell using "Ctrl+Enter".
Run the selected cell and then select the next cell using "Shift+Enter"
Run the selected cell and then create a new cell below it using "Alt+Enter"
Running the cell will generate an output. The output is displayed just beneath the cell, with an "Out[ ]:" tag to the left of it.

In [1]:
print("Ctrl+Enter should run this cell and stay here.")

Ctrl+Enter should run this cell and stay here.


In [2]:
print("Shift+Enter should run this and then select the next cell.")

Shift+Enter should run this and then select the next cell.


In [None]:
print("Shift+Enter should have brought you here!")

In [3]:
print("Alt+Enter should run this cell and then make a new cell below this one.")

Alt+Enter should run this cell and then make a new cell below this one.


In [None]:
print("Alt+Enter shouldn't have brought you here yet!")

---
# Basic Concepts

## Variables
Variables are containers for storing data values. It's kind of like algebra, where we can give different names to numbers or equations or functions.

## Data Types

### Numeric Types
**Integer** (`int`) are numbers that do not contain a decimal.

In [None]:
integer = 3
type(integer)

---
**Floating Point** (`float`) numbers are numbers that do contain a decimal.

In [None]:
floating_point = 3.0
type(floating_point)

---
### String
A **string** (`str`) is sequences of characters.

In [None]:
string = "hello"
type(string)

Many programming languages differentiate data types between individual characters and strings. Python, however, does not differentiate between these -- a character is just a single-element string.

In [None]:
char = 'a'
type(char)

---
## Data Structures

### Lists
A **list** (`list`, denoted with square brackets`[]`) is a data structure containing multiple elements, indexed numerically.

In [None]:
list_of_odds = [1,3,5,7,9]
type(list_of_odds)

Individual elements of a `list` can be accessed by their order of index using square brackets (`[]`).

In [None]:
list_of_odds = [1,3,5,7,9]

list_of_odds[0]

Note that the index starts with **0, NOT** with 1  
Attempting to access an invalid index will cause an error.

In [None]:
list_of_odds = [1,3,5,7,9]

list_of_odds[10]

Values in a `list` can be modified. They can be added to, removed from, or replaced/changed.
Add values to a `list` using the `.append()` function.

In [None]:
list_of_odds = [1,3,5,7,9]

list_of_odds.append(11)
list_of_odds

Delete values using `del` (short for `delete`).

In [None]:
list_of_odds = [1,3,5,7,9]

del list_of_odds[2]
list_of_odds

Modify a value in a list by assigning the index location accordingly.

In [None]:
list_of_odds = [1,3,5,7,9]

list_of_odds[0] = 5
list_of_odds

### Tuples
A **tuple** (`tuple`, denoted using parentheses `()`) is also a numerically indexed data structure of multiple elements. 

In [None]:
tuple_of_evens = (2,4,6,8)

type(tuple_of_evens)

Individual elements of a `tuple` can be accessed by their order of index using square brackets (`[]`), just like in a `list`.

In [None]:
tuple_of_evens = (2,4,6,8)

tuple_of_evens[0]

However, the key difference between a `tuple` and a `list` is that a `tuple` is immutable (not modifiable).

In [None]:
tuple_of_evens = (2,4,6,8)

In [None]:
tuple_of_evens.append(10)

In [None]:
del tuple_of_evens[2]

In [None]:
tuple_of_evens[3] = 3
tuple_of_evens

### Dictionaries
A **dictionary** (`dict`, denoted with curly braces `{}`) is a data structure containing multiple values indexed by keys. The relationship between keys and values of a dictionary are typically notated as `{'key':'value'}`.

In [None]:
dict_of_states = {'tx':'texas','ny':'new york','ca':'california'}
type(dict_of_states)

Values in a `dict`ionary are accessed using their `key` and square brackets (`[]`).

In [None]:
dict_of_states = {'tx':'texas','ny':'new york','ca':'california'}

dict_of_states['tx']

`Dict`ionaries are mutable. Assign a key and value directly to add or modify values.

In [None]:
dict_of_states = {'tx':'texas','ny':'new york','ca':'california'}

dict_of_states['ca'] = 'canada'
dict_of_states['fl'] = 'florida'
dict_of_states

Use `del` to delete a key-value pair.

In [None]:
dict_of_states = {'tx':'texas','ny':'new york','ca':'california'}

del dict_of_states['ca']
dict_of_states

### Values in a Data Structure
A data structure can be composed of any data type. This applies for all of the mentioned structures (`list`, `tuple`, and `dict`ionaries).

In [None]:
list_of_strings = ['abc','def','xyz']
list_of_strings

They can even contain other data structures.

In [None]:
list_of_lists = [[1,3,5,7,9],[2,4,6,8],['a','b','c','d']]
list_of_lists

They can also contain a mixture of types (something many traditional programming languages do not allow).

In [None]:
list_of_mixed_types = ['a',0,'b',3,[1,'c','xyz']]
list_of_mixed_types

In [None]:
dictionary = {'integer':5,'string':'word','char':'b','evens':tuple_of_evens,'lists':list_of_lists}
dictionary

---
## Operators
### Arithmetic Operators
Arithmetic operators are as follows. We will use `x` and `y` to demonstrate them.

In [None]:
x = 5
y = 3

`+` adds two operands.

In [None]:
x = 5
y = 3

x + y

`-` subtracts the right operand from the left operand.

In [None]:
x = 5
y = 3

x - y

`*` multiplies two operands.

In [None]:
x = 5
y = 3

x * y

`/` divides the left operand by the left one (resulting in a float).

In [None]:
x = 5
y = 3

x / y

`%` is a modulus -- remainder division of left operand by the right.

In [None]:
x = 5
y = 3

x % y

`**` is an exponent -- left operand raised to power of the right.

In [None]:
x = 5
y = 3

x ** y

---
### Assignment Operators
Arithmetic operators are as follows. We will use `x` and `y` to demonstrate them.

`=` is direct assignment.

In [None]:
x = 5
y = 3

x = y
x

`+=` is equivalent to `x = x + y`.

In [None]:
x = 5
y = 3

x += y
x

`-=` is equivalent to `x = x - y`.

In [None]:
x = 5
y = 3

x -= y
x

`*=` is equivalent to `x = x * y`.

In [None]:
x = 5
y = 3

x *= y
x

`/=` is equivalent to `x = x / y`.

In [None]:
x = 5
y = 3

x /= y
x

`%=` is equivalent to `x = x % y`.

In [None]:
x = 5
y = 3

x %= y
x

---
### Logic/Comparison Operators
Logic/comparison operators are as follows. We will use `x`,`y`, `z`, and `w` to demonstrate them.

In [None]:
x = 5
y = 3
z = 5

`>` evaluates greater than -- true if the left operand is greater than the right.

In [None]:
x = 5
y = 3
z = 5

x > y

In [None]:
y > x

`<` evaluates less than -- true if the left operand is less than the right.

In [None]:
x = 5
y = 3
z = 5

x < y

In [None]:
y < x

`==` evaulates equal to -- true if left and right operands are equal.

In [None]:
x = 5
y = 3
z = 5

x == y

In [None]:
x == z

`!=` evaluates not equal -- true if left and right operands are not equal.

In [None]:
x = 5
y = 3
z = 5

x != z

In [None]:
x != y

`>=` evaluates greater than or equal to -- true if left operand is greater than or equal to the right.

In [None]:
x = 5
y = 3
z = 5

x >= y

In [None]:
x >= z

In [None]:
y >= x

`<=` evaluates less than or equal to -- true if left operand is less than or equal to the right.

In [None]:
x = 5
y = 3
z = 5

x <= y

In [None]:
x <= z

In [None]:
y <= x

`and` or `&` is true if both operands are true.

In [None]:
x = True
y = False
z = True
w = False

x and y

In [None]:
x & z

`or` or `|` is true if either of the operands are true.

In [None]:
x = True
y = False
z = True
w = False

x or y

In [None]:
y | w

`not` or `~` is true if the operand is false (complements the operand).

In [None]:
x = True
y = False
z = True
w = False

~y

In [None]:
not x

`^` is true if the operands differ from each other (XOR).

In [None]:
x = True
y = False
z = True
w = False

x ^ y

In [None]:
x ^ z

`is` is true if the operands are identical.

In [None]:
word = 'hello'

word is 'hello'

In [None]:
word is 'goodbye'

`is not` is true if the operands are not identical.

In [None]:
word is not 'hello'

In [None]:
word is not 'goodbye'

`in` is true if the value of the left operand is found in the sequence of the right.

In [None]:
list_of_nums = [0,1,2,3]

2 in list_of_nums

In [None]:
5 in list_of_nums

`not in` is true if the value of the left operand is not found in the sequence of the right.

In [None]:
list_of_nums = [0,1,2,3]

5 not in list_of_nums

In [None]:
2 not in list_of_nums

---
## Utilities
In this section, we'll cover a couple of utilities that will be used to demonstrate other basic concepts. They are also useful for debugging in general.

### Printing
Displaying something on an output is referred to as "printing". In order to print, use the `print()` function.

In [None]:
print("Hello World!")
print(10)

You can `print` either values or variables.

In [None]:
greeting = "hello there!"

print (greeting)

In [None]:
x = 5

print (x)

### Import
Python allows data community to create modules and packages. Modules can be considered as a specialty set of functions that support a specific function, and packages are a set of modules.  

To import a module, use the command `import module`.

In [None]:
math.pi

In [None]:
import math
math.pi

We can use the command `from package import module` to import a specific module from a package.  
For example, we may only be working with dates for a certain program and don't need the entire `datetime` package, so we can import just the `date` module from the `datetime` package.

In [None]:
date.today()

In [None]:
from datetime import date
date.today()

### Pausing
Use the `time.sleep()` function to pause the program for a given number of seconds. To use this function, we have to import the `time` module.

In [5]:
import time

In [None]:
print ("hi")
time.sleep(5)
print("bye")

### Commenting
Another useful utility in Python is commenting. These are lines of text that Python will ignore in a script, enabling us to interweave helpful comments and explanations of what things are and how they work. Commenting is an extremely important piece of programming, as it can be hard to read someone else's code without explanation. Comments make it so that a script can be more easily read.

In [None]:
# Greet the world
print ("Hello World!")

---
## Control Flow
Control flow is how we are able to programmatically execute blocks of code conditionally and/or iteratively. We do this using logic checks and loops. The blocks of code are differentiated using a `:` after the control flow statement and indentation to match. This will become more clear after doing some hands-on coding, if still confusing.

### `if` Statements
An `if` statement sets up a block of code to execute upon fulfillment of a certain condition.

In [None]:
a = 10
b = 5
if a > b:
    print ("a is greater than b")

In [None]:
a = 10
b = 5
if a < b:
    print ("a is less than b")

#### `if...else` Statements
An `if-else` statement sets up a block of code to execute upon fulfillment of a certain condition.

In [None]:
a = 10
b = 5
if a > b:
    print ("a is greater than b")
else:
    print ("a is not greater than b")

#### `elif` Statements
There can be multiple code blocks with multiple conditions using `elif`.

In [None]:
a = 10
b = 5
if a > b:
    print ("a is greater than b")
elif a < b:
    print ("a is less than b")
else:
    print ("a is equal to b")

---
### `for...in` Loops
A `for...in` loop is used to iterate through a sequence and execute a block of code (the `time.sleep(1)` function is used here to emphasize the iterative behavior of a loop).

In [6]:
for x in range(5):
    time.sleep(1)
    print (x)

0
1
2
3
4


In [None]:
for y in range(3,5):
    time.sleep(1)
    print (y)

In [None]:
greet = 'hello'
for z in greet:
    time.sleep(1)
    print (z)

In [None]:
greetlist = ['hello', 'hi', 'hey', 'howdy']
for word in greetlist:
    time.sleep(1)
    print (word)

### Indentation
In Python, indentation is extremely important. The indentation level is how Python is able to group statements together, such as which statements are executed in a `for` loop or `if` conditional block. For an example of bad indentation, let's look at our previous loop if we had indented it incorrectly:

In [None]:
greetlist = ['hello','hi','hey','howdy']
for word in greetlist:
    time.sleep(1)
print (word)

You'll see that this piece of code only printed the last word in the list because the entire loop has already executed before making it to the `print` statement. So please be careful with the indentation!

---
## DataFrames
A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.  
Following are the characteristics of a data frame.
- The column names should be non-empty.
- The row names should be unique.
- The data stored in a data frame can be of numeric, factor or character type.
- Each column should contain same number of data items.
We are able to use data frame structures using the `pandas` package, giving us access to a `DataFrame` class.

For a visual idea of what a `DataFrame` is, this is how Spyder displays a `DataFrame`.  
![DataFrame](img/01_DataFrame.PNG)  
Our next section and our first module will go more into depth regarding accessing and interacting with `DataFrame` structures.

---
## Dot Notation
Dot notation is how we access instances, attributes, functions, and methods. We will use `pandas` and `DataFrames` to demonstrate this.

In [None]:
import pandas

### Instances/Objects
A `DataFrame` is a class of objects in `pandas`. To create a particular object/instance of `DataFrame`, we use dot notation.

In [None]:
df = pandas.DataFrame()
type(df)

### Functions
We also use dot notation to access a function inside of an imported package or module. We already saw this in `time.sleep`.  
Another example is to access the `read_csv()` function in the `pandas` library.

In [None]:
data = pandas.read_csv('data/session1/01_basics.csv')
data

### Attributes
Attributes are various properties of a particular instance/object.  
For example, `DataFrame`s have a `columns` attribute that lists the columns of the `DataFrame`. We can access attributes using dot notation.

In [None]:
data = pandas.read_csv('data/session1/01_basics.csv')
data

In [None]:
data.columns

### Methods
Methods are functions of a particular object.
For example, `DataFrame`s have a `transpose` function that transposes the `DataFrame`. We can access methods using dot notation.

In [None]:
data = pandas.read_csv('data/session1/01_basics.csv')
data

In [None]:
data.transpose()

---
# Wrap-up
And with that, we're done with our basics session! Most of these concepts start to become very intuitive as you start to use them, and we'll dive right into that starting in our next session. Nearly all of these concepts translate directly to all other programming languages as well, with maybe a minor syntax change here or there.