![data-x](https://raw.githubusercontent.com/afo/data-x-plaksha/master/imgsource/dx_logo.png) 

## Introduction to Data-X
Mostly basics about Anaconda, Git, Python, and Jupyter Notebooks

### Author: Alexander Fred Ojala

---


# Useful Links
1. Managing conda environments:
    - https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
2. Github:
    - https://readwrite.com/2013/09/30/understanding-github-a-journey-for-beginners-part-1/
    - https://readwrite.com/2013/10/02/github-for-beginners-part-2/
3. Learning Python (resources):
    - https://www.datacamp.com/
    - [Python Bootcamp](https://bids.berkeley.edu/news/python-boot-camp-fall-2016-training-videos-available-online
)
4. Datahub: http://datahub.berkeley.edu/ (to run notebooks in the cloud)
5. Google Colab: https://colab.research.google.com (also running notebooks in the cloud)
5. Data-X website resources: https://data-x.blog
6. Book: [Hands on Machine Learning with Scikit-Learn and Tensorflow](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291/ref=sr_1_1?ie=UTF8&qid=1516300239&sr=8-1&keywords=hands+on+machine+learning+with+scikitlearn+and+tensorflow)

# Quick Review of Python Topics

### Why Python?
Python has experienced incredible growth over the last couple of years, and many of the state of the art Machine Learning libraries being developed today have support for Python (scikit-learn, TensorFlow etc.)

### Check what Python distribution you are running

In [19]:
!which python #works on unix system, maybe not Windows

/home/afo/anaconda3/envs/data-x/bin/python


In [20]:
# Check that it is Python 3
import sys # import built in package
print(sys.version)

3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 17:14:51) 
[GCC 7.2.0]


## Python as a calculator

In [None]:
# Addition
2.1 + 2

In [None]:
# Mult
10*10

In [None]:
# Floor division
7//3

In [None]:
# Floating point division, note py2 difference
7/3

In [None]:
type(2)

In [None]:
type(2.0)

In [None]:
a = 3
b = 5
print (b**a) # ** is exponentiation

In [None]:
print (b%a)  # modulus operator = remainder

In [None]:
type(5) == type(5.0)

In [None]:
# boolean checks
a = True
b = False
print (a and b)

In [None]:
# conditional programming
if 5 == 5:
    print('correct!')
else:
    print('what??')

In [None]:
print (isinstance(1,int))

## String slicing and indices
<img src="resources/spam.png" width="480">

In [None]:
# Strings and slicing
x = "abcdefghijklmnopqrstuvwxyz"

In [None]:
print(x)

In [None]:
print(x[1]) # zero indexed

In [None]:
print (type(x))

In [None]:
print (len(x))

In [None]:
print(x)

In [None]:
print (x[1:6:2]) # start:stop:step

In [None]:
print (x[::3])

In [None]:
print (x[::-1])

### Manipulating text

In [None]:
# Triple quotes are useful for multiple line strings
y = '''The quick brown 
fox jumped over 
the lazy dog.'''
print (y)

### String operators and methods

In [None]:
# tokenize by space
words = y.split(' ')
print (words)

In [None]:
# remove break line character
[w.replace('\n','') for w in words]

<div class='alert alert-success'>TAB COMPLETION TIPS</div>

In [None]:
words.

In [None]:
y.

In [None]:
str()

# Data Structures

## **Tuple:** Sequence of Python objects. Immutable.

In [None]:
t = ('a','b', 3)
print (t) 
print (type (t))
isinstance(t,tuple)
t[1]

In [None]:
t[1] = 2 #error

## **List:** Sequence of Python objects. Mutable

In [None]:
y = list() # create empty list
type(y)

In [None]:
type([])

In [None]:
# Append to list
y.append('hello')
y.append('world')
print(y)

In [None]:
y.pop(1)

In [None]:
print(y)

In [None]:
# List addition (merge)
y + ['data-x']

In [None]:
# List multiplication
y*4

In [None]:
# list of numbers
even_nbrs = list(range(0,20,2)) # range has lazy evaluation
print (even_nbrs)

In [None]:
# supports objects of different data types
z = [1,4,'c',4, 2, 6]
print (z)

In [None]:
# list length (number of elements)
print(len(z))

In [None]:
# it's easy to know if an element is in a list
print ('c' in z)

In [None]:
print (z[2])  # print element at index 2

In [None]:
# traverse / loop over all elements in a list
for i in z:
    print (i)

In [None]:
# lists can be sorted, 
# but not with different data types
z.sort()

In [None]:
#z.sort() # doesn't work
z.pop(2)

In [None]:
z.sort() # now it works!
z

In [None]:
print (z.count(4))  # how many times is there a 4

In [None]:
# loop examples
for x in z:
    print ("this item is ", x)

In [None]:
# print with index
for i,x in enumerate(z):
    print ("item at index ", i," is ",  x )

In [None]:
# print all even numbers up to an integer
for i in range(0,10,2):
    print (i)

In [None]:
# list comprehesion is like f(x) for x as an element of Set X
# S = {x² : x in {0 ... 9}}
S = [x**2 for x in range(10)]
print (S)

In [None]:
# All even elements from S
# M = {x | x in S and x even}
M = [x for x in S if x % 2 == 0]
print (M)

In [None]:
# Matrix representation with Lists
print([[1,2,3],[4,5,6]]) # 2 x 3 matrix

# Sets (collection of unique elements)

In [None]:
# a set is not ordered
a = set([1, 2, 3, 3, 3, 4, 5,'a'])
print (a)

In [None]:
b = set('abaacdef')
print (b) # not ordered

In [None]:
print (a|b) # union of a and b

In [None]:
print(a&b) # intersection of a and b

In [None]:
a.remove(5)
print (a) # removes the '5'

# Dictionaries: Key Value pairs
Almost like JSON data

In [None]:
# Dictionaries, many ways to create them
# First way to create a dictionary is just to assign it
D1 = {'f1': 10, 'f2': 20, 'f3':25}              

In [None]:
D1['f2']

In [None]:
# 2. creating a dictionary using the dict()
D2 = dict(f1=10, f2=20, f3 = 30)
print (D2['f3'])

In [None]:
# 3. Another way, start with empty dictionary
D3 = {}
D3['f1'] = 10
D3['f2'] = 20
print (D3['f1'])

In [None]:
# 4th way, start with list of key-value tuples
y = [('f1', 10), ('f2', 40),('f3',60)]
D4 = dict(y)
print (D4['f2'])

In [None]:
#5 From keys
keys = ('a', 'b', 'c')
D5 = dict.fromkeys(keys)               # new dict with empty values
print (D5['c'])

In [None]:
# Dictionaries can be more complex, ie dictionary of dictionaries or of tuples, etc.
D5['a'] = D1
D5['b'] = D2
print (D5['a']['f3'])

In [None]:
# traversing by key
# key is imutable, key can be number or string
for k in D1.keys():
    print (k)

In [None]:
# traversing by values
for v in D1.values(): 
    print(v)

In [None]:
# traverse by key and value is called item
for k, v in D1.items():                # tuples with keys and values
    print (k,v)

# User input

In [None]:
# input
# raw_input() was renamed to input() in Python v3.x
# The old input() is gone, but you can emulate it with eval(input())

print ("Input a number:")
s = input()  # returns a string
a = int(s)
print ("The number is ", a)

# Import packages

In [None]:
import numpy as np

In [None]:
np.subtract(3,1)

# Functions

In [None]:
def adder(x,y):
    s = x+y
    return(s)

In [None]:
adder(2,3)

# Classes

In [None]:
class Holiday():
    def __init__(self,holiday):
        self.base = 'Happy {}!'
        self.greeting = self.base.format(holiday)
    
    def greet(self):
        print(self.greeting)
        
easter = Holiday('Easter')
hanukkah = Holiday('Hanukkah')

In [None]:
easter.greeting

In [None]:
hanukkah.greet()

In [None]:
# extend class

class Holiday_update(Holiday):
    
    def update_greeting(self, new_holiday):
        self.greeting = self.base.format(new_holiday)

In [None]:
hhg = Holiday_update('July 4th')

In [None]:
hhg.greet()

In [None]:
hhg.update_greeting('Labor day / End of Burning Man')
hhg.greet()

In [None]:
# Quick Review of Python Topics

### Why Python?
Python has experienced incredible growth over the last couple of years, and many of the state of the art Machine Learning libraries being developed today have support for Python (scikit-learn, TensorFlow etc.)

<img src='https://zgab33vy595fw5zq-zippykid.netdna-ssl.com/wp-content/uploads/2017/09/growth_major_languages-1-1400x1200.png' width=600></img>

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/

### Check what Python distribution you are running

!which python #works on unix system, maybe not Windows

# Check that it is Python 3
import sys # import built in package
print(sys.version)

## Python as a calculator

# Addition
2.1 + 2

# Mult
10*10

# Floor division
7//3

# Floating point division, note py2 difference
7/3

type(2)

type(2.0)

a = 3
b = 5
print (b**a) # ** is exponentiation

print (b%a)  # modulus operator = remainder

type(5) == type(5.0)

# boolean checks
a = True
b = False
print (a and b)

# conditional programming
if 5 == 5:
    print('correct!')
else:
    print('what??')

print (isinstance(1,int))

## String slicing and indices
<img src="resources/spam.png" width="480">

# Strings and slicing
x = "abcdefghijklmnopqrstuvwxyz"

print(x)

print(x[1]) # zero indexed

print (type(x))

print (len(x))

print(x)

print (x[1:6:2]) # start:stop:step

print (x[::3])

print (x[::-1])

### Manipulating text

# Triple quotes are useful for multiple line strings
y = '''The quick brown 
fox jumped over 
the lazy dog.'''
print (y)

### String operators and methods

# tokenize by space
words = y.split(' ')
print (words)

# remove break line character
[w.replace('\n','') for w in words]

<div class='alert alert-success'>TAB COMPLETION TIPS</div>

words.

y.

str()

# Data Structures

## **Tuple:** Sequence of Python objects. Immutable.

t = ('a','b', 3)
print (t) 
print (type (t))
isinstance(t,tuple)
t[1]

t[1] = 2 #error

## **List:** Sequence of Python objects. Mutable

y = list() # create empty list
type(y)

type([])

# Append to list
y.append('hello')
y.append('world')
print(y)

y.pop(1)

print(y)

# List addition (merge)
y + ['data-x']

# List multiplication
y*4

# list of numbers
even_nbrs = list(range(0,20,2)) # range has lazy evaluation
print (even_nbrs)

# supports objects of different data types
z = [1,4,'c',4, 2, 6]
print (z)

# list length (number of elements)
print(len(z))

# it's easy to know if an element is in a list
print ('c' in z)

print (z[2])  # print element at index 2

# traverse / loop over all elements in a list
for i in z:
    print (i)

# lists can be sorted, 
# but not with different data types
z.sort()

#z.sort() # doesn't work
z.pop(2)

z.sort() # now it works!
z

print (z.count(4))  # how many times is there a 4

# loop examples
for x in z:
    print ("this item is ", x)

# print with index
for i,x in enumerate(z):
    print ("item at index ", i," is ",  x )

# print all even numbers up to an integer
for i in range(0,10,2):
    print (i)

# list comprehesion is like f(x) for x as an element of Set X
# S = {x² : x in {0 ... 9}}
S = [x**2 for x in range(10)]
print (S)

# All even elements from S
# M = {x | x in S and x even}
M = [x for x in S if x % 2 == 0]
print (M)

# Matrix representation with Lists
print([[1,2,3],[4,5,6]]) # 2 x 3 matrix

# Sets (collection of unique elements)

# a set is not ordered
a = set([1, 2, 3, 3, 3, 4, 5,'a'])
print (a)

b = set('abaacdef')
print (b) # not ordered

print (a|b) # union of a and b

print(a&b) # intersection of a and b

a.remove(5)
print (a) # removes the '5'

# Dictionaries: Key Value pairs
Almost like JSON data

# Dictionaries, many ways to create them
# First way to create a dictionary is just to assign it
D1 = {'f1': 10, 'f2': 20, 'f3':25}              

D1['f2']

# 2. creating a dictionary using the dict()
D2 = dict(f1=10, f2=20, f3 = 30)
print (D2['f3'])

# 3. Another way, start with empty dictionary
D3 = {}
D3['f1'] = 10
D3['f2'] = 20
print (D3['f1'])

# 4th way, start with list of key-value tuples
y = [('f1', 10), ('f2', 40),('f3',60)]
D4 = dict(y)
print (D4['f2'])

#5 From keys
keys = ('a', 'b', 'c')
D5 = dict.fromkeys(keys)               # new dict with empty values
print (D5['c'])

# Dictionaries can be more complex, ie dictionary of dictionaries or of tuples, etc.
D5['a'] = D1
D5['b'] = D2
print (D5['a']['f3'])

# traversing by key
# key is imutable, key can be number or string
for k in D1.keys():
    print (k)

# traversing by values
for v in D1.values(): 
    print(v)

# traverse by key and value is called item
for k, v in D1.items():                # tuples with keys and values
    print (k,v)

# User input

# input
# raw_input() was renamed to input() in Python v3.x
# The old input() is gone, but you can emulate it with eval(input())

print ("Input a number:")
s = input()  # returns a string
a = int(s)
print ("The number is ", a)

# Import packages

import numpy as np

np.subtract(3,1)

# Functions

def adder(x,y):
    s = x+y
    return(s)

adder(2,3)

# Classes

class Holiday():
    def __init__(self,holiday):
        self.base = 'Happy {}!'
        self.greeting = self.base.format(holiday)
    
    def greet(self):
        print(self.greeting)
        
easter = Holiday('Easter')
hanukkah = Holiday('Hanukkah')

easter.greeting

hanukkah.greet()

# extend class

class Holiday_update(Holiday):
    
    def update_greeting(self, new_holiday):
        self.greeting = self.base.format(new_holiday)

hhg = Holiday_update('July 4th')

hhg.greet()

hhg.update_greeting('Labor day / End of Burning Man')
hhg.greet()