# So What next? A quick tour of some useful libraries for research

### What we already know
- basic data types (ints, floats, strings, booleans)
- Some functions and methods to manipulate these data types (i.e. type(), len(), str.upper())
- where to put our data (Variables, Lists, Dictionaries
- and how to manipulate and build these variables and lists
- How to create, store, and use data in a dictionary
- Use conditional statements to allow the computer to choose when to execute certain commands
- use loops so that the computer can repeat the brilliant code you have written for it
- Made your own functions!

__Learning Objectives__:

- Use our first Packages to work with numbers!
- Upload some text
- Use some more Packages to analyse our text!

## In summary we have barely scratched the surface of Python's capabilities

### Thats not to sell us short, we have learned some of the classic concepts in Computer Programming, that are applicable to all computer languages, (Even our best frenemy R) 

- Its just all about where you put your tabs,commas, colons etc.....

- And you can probably still do just about anything with the tools we have now... you would just have to type a lot of code

### Back to our Kitchen metaphor... We have a very basic kitchen right now for our Pydog to work with, and he could make toast in the oven right?

<center><img src="https://gitlab.unimelb.edu.au/rescom-training/python/introduction-to-python-for-researchers/-/raw/master/Imbedded%20Pics/toast_in_oven.jpg" alt="dr_evil" style="width:auto;height:40vh"></center>



### But wouldn't it be easier and more efficient to make it in a toaster?

<center><img src="https://gitlab.unimelb.edu.au/rescom-training/python/introduction-to-python-for-researchers/-/raw/master/Imbedded%20Pics/happytoast.jpg" alt="dr_evil" style="width:auto;height:70vh"></center>



## Luckily, heaps of people have crafted the Python Equivalent of Toasters, Garlic Presses, Mandoline Slicers, etc etc....

- And we are lucky, When people code up fancier *Packages*, they ***inherit*** their features from our old mates the ***list***, and the ***dictionary***, the ***methods, mutability, iterability, sliceability, etc.*** so you practically already know how to use them!!!



**Example** Since lists are a lot like excel spreadsheet columns, lets do some maths with two lists and see what happens

In [None]:
# lets just add these two guys together
xs = [1,2,3,4,5]
ys = [100,200,300,400,500]
xs+ys

uh oh! looks like + just means append the two lists together in python

In [None]:
"kind of "+"like"+" strings!"

In [None]:
# we could do a one liner list comprehension With a new function callez zip - which literally zips up the items in the list
z = [x+y for x, y in zip(xs,ys)]
z

####  but that thing is bloody annoying if we want to do a lot of maths on lists, luckily, lots of people have agreed with me and have invented Numpy and Scipy, your open source replacement for MATLAB!

In [None]:
# Finally going to get our PyToaster out!!!!
import numpy as np # Note the as np part means I have given this package a nickname

In [None]:
xar = np.array(xs)
yar = np.array(ys)

zar = xar+yar

zar

In [None]:
# and its pretty much just like a list!!

zar[-1]-zar[0:3]

In [None]:
# However, numpy arrays have more rules to live by, i.e. they really need all of the same data type to be useful

xs = np.array([1,2,'3'])

print('type for item 0', type(xs[0]))
print('type for item 2', type(xs[2]))

### Numpy has piles and piles of other awesome things associated with it but we don't have time to get into them
- and the cost is we have to be more careful about what kind of data get put into an array

In [None]:
# But loading data is as easy as a function

# just remember Python is looking for that filename in the same folder as the jupyter notebook
data = np.loadtxt('rando_data.csv', delimiter= ',') 

# numpy arrays can have multiple dimensions
print(data.shape)

In [None]:
# and can easily be plotted with matplotlib

%matplotlib inline 
import matplotlib.pyplot as plt

plt.imshow(data)


## But what if we have whole spreadsheets of data? Then make friends with Pandas!

While handy for a lot of things, pandas is the gold standard for reading in spreadsheets in Python these days

In [None]:
import pandas as pd

# this file is in the same folder as the notebook
df = pd.read_csv("iris.csv")

df.head()

And you can really just think of it as a dictionary, full of numpy arrays

In [None]:
# pull out one of the columns
df['sepal_length'][3]

### Pandas has some awesome one line functionality

In [None]:
df.describe()

In [None]:
# one line plotting functionality
# note need to write this for Jupyter to make plots appear
%matplotlib inline 

# quickly histogram all of your data
df.hist()

In [None]:
# With seaborn we can make even better looking graphs!
import seaborn as sns
import matplotlib.pyplot as plt

g = sns.PairGrid(df, hue="species")
g = g.map_diag(plt.hist)
g = g.map_offdiag(plt.scatter)
g = g.add_legend()

### Challenge at home: But what if we want just one column to plot using dataFrame.hist()? 

Let's Google it, Hint, our df is a pandas "dataframe" object

### Once again, we have barely scratched the surface of the functionality of these Packages
- If you are intrigued, let us know, because we are actually keen on giving short courses on Numpy, Pandas, Scipy, and plotting with Python

## Working with text files

## File Input/Output

- Reading Files is something Python is a gun at
- you use a function called `open()` that brings in a file object.
- You can then use a variety of ***Methods*** to process that object
- `open()` takes three arguments - the file path/file name you want to open, the "access mode" 

There are 3 general "access_modes"

- "r" = read. 
    - only lets you _read_ the data from the file.
- "w" = write
    - lets you _write_ data into a file
- "a" = append
    - lets you _add_ data to the end of an existing file. 
- "r+" = special read/write
    - lets you both read and write to/from the same file. 

### Be careful using "w"/write though! If you open a file using the "w" option, anything that used to be inside that file will be erased.  You would only use this when you are making a new files really

- Let's test the "write" access mode

In [None]:
example = open("Robin.txt", "w") #THIS WILL DELETE EVERYTHING IN THE FILE!!!
example.close()

In [None]:
# You essentially only use this to write a new file

examplew2 = open('new_example.txt','w')

'''
This is another way to comment, usually you would put some data into our new file
'''

examplew2.close()

Open the file again on your computer. See how the example text has been erased *** Be very wary of this guys!***

### File Input - read() function

Opening your file is pretty useless if you can't actually see what's in it though

In [None]:
#open(file name , method of opening)
file = open("Zen_of_Python.txt","r")

type(file)

In [None]:
type(file) # What the hell is that?

In [None]:
# This will give me the whole file's contents at once
file = open("Zen_of_Python.txt","r")
file.read()

### So there are a few things to examine in here. 
- where did the line breaks go?
- That's because those `\n` symbols are "newline" characters - when the computer is interpreting text, it will use this newline character to tell it to print that text on a new line. 
- Our`print()` gets it:

In [None]:
file = open("Zen_of_Python.txt","r")
file_all = file.read()

print(file_all)

The most common non-printable aka **White space** characters you need to know are:

- \n : new line. You may also sometimes see \r\n on files written by Windows systems.
- \t : tab



### The annoying thing is, we probably want to ignore these characters to analyse a text meaningfully right? 
- Introducing the magic doodad method ```replace()```
- I am tired of explaining things so lets just google it and figure out how it works

In [None]:
# Go back up and grab our text
replacers = ['\n', '.',',','!','-']
for replace in replacers:
    file_all2 = file_all2.replace(replace, ' ')
file_all2
file_all2 = file_all2[17:]
file_all2

### Minichallenge - Lets get ride of the title, author, and our punctuation characters commas, periods, '--' thingies, and exclamation ponts 

Remember: strings can be treated a lot like lists right so you can slice em up!!!

# Lets use the Natural Languange Toolkit to take a look at our text:

We want to collect the following info from our  poem:

- What words were used the most?

In [None]:
import nltk #importing the nltk 
import collections #importing collections which is a sibling package to nltk
from nltk.tokenize.casual import casual_tokenize #importing a particular function from nltk

In [None]:
# Create a variable copy from our text its a string so we dont need to use a copy command

text = file_all2

In [None]:
text = text.lower()
text

In [None]:
text_tok = casual_tokenize(text) # use our imported tokenizing function, lets see what it does

text_tok

In [None]:
from collections import Counter

#Counter lets you quickly count how many times a word occurs

In [None]:
# Used on a string, it just counts the individual letters
Counter('')

In [None]:
Counter(["What",'about','a', 'list','a','list','you','say'])

In [None]:
text_tok_counted = Counter(text_tok)
text_tok_counted

In [None]:
text_tok_counted.most_common(2)

# CONGRATULATIONS FOR REACHING THE END

And thus begins the start of your journey into the world of Python. The coming weeks and months are going to be frustrating - you know that you *can* do something, but you *just can't quite* remember how it's done. Or maybe you need to keep looking up the simple things. And general syntax is going to be horrendous to remember.


![image.png](https://gitlab.unimelb.edu.au/rescom-training/python/introduction-to-python-for-researchers/-/raw/master/Imbedded%20Pics/ending_meme.png)

## Python Class Graduation Speech/ commandments:

1.) Remember the 7 steps of error grief BUt....

2.) "Don't cry over spilt milk" doesn't really even apply because you didn't even spill any milk (as long as you save older versions of your code that work ;p

3.) Work a couple lines at a time, and don't be afraid to work on silly little toy data sets to figure out how stuff works

4.) the print() command is a great way to "look under the hood," If your code isn't doing what you think it should, print out your variables and see whats going on

5.) ```# COMMENT```, do not assume that future you will remember why you put that loop there or what the variable 'a' stands for 

5.) There are many ways to do the same thing in Python, don't worry too much about which way you choose especially while learning

6.) However, you will **Never** stop learning, there will always be new cool packages to use in Python



## So what next?

Lots of online resources and modules, here is one designed for grad students mostly: 

https://software-carpentry.org/

***Try to do something really easy***, something that you already know how to do really quickly in excel etc. But try it with Python

### But most importantly: Stay connected with our community, we are here to help

We have other trainings in nltk and high speed programming with python, possibly some other package training in the future:

Follow the python community on twitter to find out about them

@GeoGarber ,@ResPlat 