# Getting Started With Data Mining

This document contains information you will need for the workshop. Please make sure to read the entire document. Any questions may be directed towards the ACM discord @ https://discord.gg/y64rhyXka3 in the #data-mining-workshop channel.

Note: You can access a **runnable** version of this notebook @ https://colab.research.google.com/drive/1-rxLup8kQoyx9-J69jOfEP7ZizzHIvJ1?usp=sharing. Just click connect at the top right corner of the website!

## Installing Python 3
1. Navigate to https://www.python.org/downloads/
2. Scroll down under the 'Looking for a specific release?' section & click download next to Python 3.9.0
3. Scroll all the way down and download the 'Windows x86-64 executable installer' for Windows 64-bit or 'macOS 64-bit installer' for MacOS 64 bit.
4. Open the installer and follow the prompts
    - **Make sure to check the box 'Add Python 3.9 to PATH'**
    - At the end of setup, check off Disable path length limit & Allow the app to make changes.
5. Open up the terminal and type in 'python --version', the console should respond with Python 3.9.0

## Getting Started With Python 3
Python is an interpreted language-- meaning that source code is compiled into bytecode and executed by the OS. 

Python is dynamically typed, meaning that you should not declare the type of a variable when assigning a value to it.

Python uses indents (and not {}) to identify code blocks. The end of one line marks the end of a statement, we do not need to use a semicolon to end a statement.

We will be going over the basics of Python. Feel free to experiment in the Python interactive shell. You will be able to execute Python statements as if you were in a terminal. Open up the python shell by opening up your computer's terminal and type in "python".

### Variables
Variables in python can be visualized as pointers to memory locations. The type of a variable is determined by the contents of the variable. Booleans are declared with True or False

We can use the print() function to output the contents of a variable to the console.

Try practing @ https://www.learnpython.org/en/Variables_and_Types

In [1]:
name = 10
boolean = True
print(name)
print(boolean)

10
True


In [2]:
name = "HackUTD"
print(name)

HackUTD


We can do some pretty cool stuff with ints and strings

In [3]:
a = 10
print(a)
a += 1 # we cannot do ++ or -- in python
print(a)

10
11


In [4]:
b = 'HackUTD'
b += ' Workshop!'
print(b)

HackUTD Workshop!


You can gather input from a user using the input() function

In [5]:
print("What is your name?")
inputVal = input()
print("Hi there, " + inputVal)

What is your name?
Jake
Hi there, Jake


### Control Structures
There are several control structures in Python:
- if
- if else
- if elif else
- while
- for
- functions

Notice how there is no switch statement in python.


Python relational & boolean operators:
- == and !=
- < <= > >=
- and or not

Sample if statement:

In [6]:
a = 10

if a == 10:
    print("True")

True


Notice how the indents were used to signify code blocks, how parenthesis were not required around the condition, and the : after the if statement.

Here is another example using the if/else statement

In [7]:
a = 10
b = 20

if a == b:
    print("A is equal to B")
else:
    print("A is not equal to B")

A is not equal to B


Here is an example using the if/elif/else statement

In [8]:
if a == b:
    print("A is equal to B")
elif a < b:
    print("A is less than B")
else:
    print("A is greater than B")

A is less than B


While statement example

In [9]:
i = 0

while i < 10:
    print(i)
    i += 1

0
1
2
3
4
5
6
7
8
9


**Functions** can be declared as:

def f_name(parameters): statement(s) return expression(s) or just return

and called by name:

f_name(parameters)

In [10]:
def printName(name):
    print("My name is " + name)

printName("Kevin")

My name is Kevin


Get some practice in @ https://www.learnpython.org/en/Conditions & https://www.learnpython.org/en/Loops

### Lists
A list is an ordered collection of objects. Any object can be held inside of a list. Each item in the list does not need to be the same type.

Lists are declaired in square brackets []

In [11]:
l = []
l.append('a') # adding to a list
l.append('b')
l.append('c')
print(l)

print(l[0]) # accessing an item in a list

['a', 'b', 'c']
a


You can find more information over lists @ https://github.com/kjmazidi/Python_for_AI/blob/master/1-Python_Basics/05%20-%20Lists.ipynb

### Tuples
Tuples are immutable lists. This is helpful in some sequences when you wish to store data or return multiple values from a function.

A tuple can simply be created by assigning a list of comma-separated objects on the right-hand-side of the assignment operator.

Tuples use parenthesis () while lists use brackets []!

In [12]:
example = ("hello", 1) # we create a tuple with ()

print(example[0])
print(example[1])

hello
1


Example of returning multiple values from a function:

In [13]:
def example(num1, num2):
    return (num1 * 5, num2 * 2)

val1, val2 = example(5, 10) # val1 holds the first value returned by the function (num1 * 5) and num2 holds the second value returned 

print(val1)
print(val2)

25
20


### Sets
Sets are unordered collections of objects. Duplicate entries are removed.

We can create a set using enclosed curly braces {}

In [14]:
people = {'Jake', 'Paul', 'Mark', 'Paul'}

You can check if a set contains an item by using the 'in' keyword

In [15]:
'Jake' in people

True

In [16]:
'Robert' in people

False

### Dicts

Dictionaries in python are implemented under a hash table with keys that map to values

In [17]:
dictVar = {} # declaring a dict
dictVar['1'] = "Adam" # assign a value with varName[key] = value
dictVar['2'] = "John"
dictVar['10'] = "Kat"

print(dictVar['10'])

Kat


You can find more information @ https://github.com/kjmazidi/Python_for_AI/blob/master/1-Python_Basics/07%20-%20Dicts.ipynb

### Files
Python makes it easy to read and write to files.

You can open a file for reading using the open() function. 

In [18]:
f = open('sample.txt','r') # the 'r' means we are only reading the file
text = f.read()
print('File contents:\n', text)
f.close()

File contents:
 I love data
Data mining is great
foo
bar


You can use the "with" statement to close a file automatically after it is done reading

In [19]:
with open('sample.txt', 'r') as f:
    text = f.read()
print("File contents:\n", text)

File contents:
 I love data
Data mining is great
foo
bar


You can read a line at a time using a for loop.

In [20]:
with open('sample.txt', 'r') as f:
    for line in f:
        print(line)

I love data

Data mining is great

foo

bar


You can write to a file using the open() function and the file.write() function. (We are using the "with" statement to automatically close the file after it is done writing.

In [21]:
with open('output.txt','w') as f:
    f.write("Hello World!\n")
    f.write("I love data!\n")

## Installing Packages

Python comes with a default package installer.

The basic way to install a package is to launch your **terminal** and type in:

'pip install SomePackage' **or** 'python -m pip install SomePackage'

You can install a specific version of a package by typing in:

'pip install SomePackage==version' **or** 'python -m pip install SomePackage==version'


### For the Workshop...
In this workshop, we will be using three Python packages: Scrapy, BeautifulSoup, & Seaborn. Specific versions of these packages will be used in the workshop & are outlined below.

Install Scrapy by typing in terminal:

**'python -m pip install scrapy==2.4.1'**

Install BeautifulSoup by typing in terminal:

**'python -m pip install beautifulsoup4==4.9.3'**

Install Seaborne by typing in terminal:

**'python -m pip install seaborn=0.11.1'**

**Make sure you have these two packages installed before the workshop begins**