## Closing Thoughts

Congratulations, you survived this programming class! It's been a journey-- we laughed, we cried, we learned a little bit about Python. 

What did we learn, exactly? Let's go back to the very first day, where we talked about 2 basic concepts: 

* Representation
* Abstraction


### Representation

We learned that information can be represented in multiple ways. Let's say we want to represent some information about our friend Bob. We could do this using lists, like so: 

In [52]:
values = ['Bob', 'blue','black','6ft0in','200lbs']
attributes = ['name','eyecolor','haircolor','height','weight']

print attributes
# print values

print values[0] #name
print values[1] #eye color
print values[4] #weight

['name', 'eyecolor', 'haircolor', 'height', 'weight']
Bob
blue
200lbs


Or we can represent the same exact information as a dictionary. This is the same information, but in a more intuitive format for humans to use, since we can refer to attributes by name. 

In [43]:
personD = dict(zip(attributes,values))

print personD['name']
print personD['eyecolor']
print personD['weight']

Bob
blue
200lbs


We also learned about special datatypes like `numpy` arrays and `pandas` DataFrames for representing mathematical and other information, and we learned how to create visual representations of these numbers using `matplotlib`. 

**In all cases, we take information from the real world, and we represent it on a computer. Once we do that, we can manipulate it and make use of it**

The different datatypes have different costs and benefits. You as the programmer need to decide which one is most appropriate for the situation. Sometimes it's just personal preference. There are always multiple ways to solve problems!

### Abstraction

We also learned a lot about abstraction. Abstraction basically means: how much you have to deal with tedious, low-level details. Remember loading in data using the `csv` package? Here we load in a file, and grab just the RT column, and compute the mean. Notice we have to do a lot of looping and converting of data. 

In [44]:
import csv
import numpy as np

data = []

with open('./datasets/behavioral.csv','r') as f:
    reader = csv.reader(f,delimiter=',')
    
    for row in reader:
        data.append(row)

        
RTs = [] 

#grab just the RT "column"
for row in data[1:]:
    rt = row[5]
    rt = float(rt)
    RTs.append(rt)
    
print data[:25] #a list of lists
print np.mean(RTs)
        

[['ID', 'Block', 'Trial', 'Task', 'Accuracy', 'RT'], ['801', '1', '1', '2', '1', '1752.04'], ['801', '1', '2', '2', '1', '823.718'], ['801', '1', '3', '2', '1', '646.493'], ['801', '1', '4', '2', '1', '592.449'], ['801', '1', '5', '2', '1', '679.79'], ['801', '1', '6', '2', '1', '761.609'], ['801', '1', '7', '2', '0', '555.387'], ['801', '1', '8', '2', '1', '1020.5'], ['801', '1', '9', '2', '1', '1090.44'], ['801', '1', '10', '2', '1', '2086.91'], ['801', '1', '11', '2', '1', '688.475'], ['801', '1', '12', '2', '1', '682.268'], ['801', '1', '13', '2', '0', '1228.09'], ['801', '1', '14', '2', '1', '964.625'], ['801', '1', '15', '2', '1', '832.73'], ['801', '1', '16', '2', '1', '692.578'], ['801', '1', '17', '2', '1', '929.221'], ['801', '1', '18', '2', '1', '642.677'], ['801', '1', '19', '2', '1', '775.196'], ['801', '1', '20', '2', '1', '691.337'], ['801', '2', '1', '1', '1', '9886.39'], ['801', '2', '2', '1', '1', '1587.63'], ['801', '2', '3', '1', '1', '1104.97'], ['801', '2', '4', '

`numpy` abstracts some of the details for us, so we can read the csv file with 1 line of code. They have taken care of the tedious details for us.  

Notice, though, that we need to figure out which column is the RT column (in this case, it's the 6th one). Numpy requires the programmer to figure it out for him/herself. 

In [45]:
data = np.genfromtxt('./datasets/behavioral.csv',delimiter=',',skip_header=1)

print data[:25,:]
np.mean(data[:,5])


[[  8.01000000e+02   1.00000000e+00   1.00000000e+00   2.00000000e+00
    1.00000000e+00   1.75204000e+03]
 [  8.01000000e+02   1.00000000e+00   2.00000000e+00   2.00000000e+00
    1.00000000e+00   8.23718000e+02]
 [  8.01000000e+02   1.00000000e+00   3.00000000e+00   2.00000000e+00
    1.00000000e+00   6.46493000e+02]
 [  8.01000000e+02   1.00000000e+00   4.00000000e+00   2.00000000e+00
    1.00000000e+00   5.92449000e+02]
 [  8.01000000e+02   1.00000000e+00   5.00000000e+00   2.00000000e+00
    1.00000000e+00   6.79790000e+02]
 [  8.01000000e+02   1.00000000e+00   6.00000000e+00   2.00000000e+00
    1.00000000e+00   7.61609000e+02]
 [  8.01000000e+02   1.00000000e+00   7.00000000e+00   2.00000000e+00
    0.00000000e+00   5.55387000e+02]
 [  8.01000000e+02   1.00000000e+00   8.00000000e+00   2.00000000e+00
    1.00000000e+00   1.02050000e+03]
 [  8.01000000e+02   1.00000000e+00   9.00000000e+00   2.00000000e+00
    1.00000000e+00   1.09044000e+03]
 [  8.01000000e+02   1.00000000e+00  

808.22422437500006

`pandas` makes it even easier! Here we read the file with 1 line, and we can access the column name by just saying `df.RT`. Pandas has *abstracted* away from the tedious details of reading, lopping through, and converting the data like we did in the above examples. 

This is only possible because someone else wrote the code. Somewhere in the code for the pandas package, there is a loop very similar to the one we created above.

In [46]:
from pandas import DataFrame

df = DataFrame.from_csv('./datasets/behavioral.csv',index_col=False)

print df.RT[:25]
print df.RT.mean()

0     1752.040
1      823.718
2      646.493
3      592.449
4      679.790
5      761.609
6      555.387
7     1020.500
8     1090.440
9     2086.910
10     688.475
11     682.268
12    1228.090
13     964.625
14     832.730
15     692.578
16     929.221
17     642.677
18     775.196
19     691.337
20    9886.390
21    1587.630
22    1104.970
23    1143.480
24    1050.860
Name: RT, dtype: float64
808.224224375


As the solutions get more abstract, we need to worry about less details, and we can do the same stuff with fewer lines of code. That is what abstraction is all about. 

### Abstraction and Generalizability

There's another way to think about abstraction. Imagine we want to load in a file and compute the mean of the RT column. The script above works for exactly 1 situation-- computing the mean of the RTs for the specific file, behavioral.csv. Throughout the term, I urged you to first come up with these specific solutions, then modify them so they're more general-purpose. So, with 1 small change, I can make this work for any file that I name:


In [47]:

filename = './datasets/behavioral_shorter.csv'


df = DataFrame.from_csv(filename,index_col=False)

print df.RT[:25]
print df.RT.mean()

0     1752.040
1      823.718
2      646.493
3      592.449
4      679.790
5      761.609
6      555.387
7     1020.500
8     1090.440
9     2086.910
10     688.475
11     682.268
12    1228.090
13     964.625
14     832.730
15     692.578
16     929.221
17     642.677
18     775.196
19     691.337
20    9886.390
21    1587.630
22    1104.970
23    1143.480
24    1050.860
Name: RT, dtype: float64
809.366770066


But I can take it even further. How about I take a list of file names, and have it perform the same thing on each file? Now my solution is more general. It will work the same for 2 files or 2000 files. This is another way to think about abstraction-- going from a very concrete solution (loading 1 particular file) to a more general one (loading many files). 

In [48]:
file_list = ['./datasets/behavioral.csv','./datasets/behavioral_shorter.csv']

for filename in file_list:
    df = DataFrame.from_csv(filename,index_col=False)

    print df.RT[:25]
    print df.RT.mean()

0     1752.040
1      823.718
2      646.493
3      592.449
4      679.790
5      761.609
6      555.387
7     1020.500
8     1090.440
9     2086.910
10     688.475
11     682.268
12    1228.090
13     964.625
14     832.730
15     692.578
16     929.221
17     642.677
18     775.196
19     691.337
20    9886.390
21    1587.630
22    1104.970
23    1143.480
24    1050.860
Name: RT, dtype: float64
808.224224375
0     1752.040
1      823.718
2      646.493
3      592.449
4      679.790
5      761.609
6      555.387
7     1020.500
8     1090.440
9     2086.910
10     688.475
11     682.268
12    1228.090
13     964.625
14     832.730
15     692.578
16     929.221
17     642.677
18     775.196
19     691.337
20    9886.390
21    1587.630
22    1104.970
23    1143.480
24    1050.860
Name: RT, dtype: float64
809.366770066


But wait, there's more! Let's make our code into a function that can load in any list of files, and compute the mean of any column. With a couple small tweaks, it's even more general. Also, we can now refer to this solution by name: `calc_means`. I could bundle this code into a Python package and share it with someone else. They don't need to know about pandas or anything, I just tell them, "use `calc_means`, it'll do what you want!" I have abstracted away from the tedious details. We go from several lines of code to just 1 line. 

In [49]:
#function definition, we do this just once
def calc_means(file_list,column_name):

    allmeans = []
    
    for filename in file_list:
        df = DataFrame.from_csv(filename,index_col=False)
        allmeans.append(df[column_name].mean())
    
    return(allmeans)
        

In [50]:
#...then we can use calc_means as many times as we want!
calc_means(file_list,'RT')

[808.2242243750001, 809.3667700657912]

### Life Lesson: Programming is Problem Solving!

I emphasized this point at the beginning of the term. Programming is not magical, and it's not really about computers, it's about *solving problems*. Programming is just a set of tools for problem-solving. I have tried to show you how to solve a wide range of problems using some of the tools that Python provides (e.g., lists, dictionaries, arrays, DataFrames). Different packages provide different tools for representing and manipulating information. Likewise, different programming languages will give you different tools. 

**80% of programming is deciding how to divide up your problem and represent it on the computer**

**The other 20% is dealing with the peculiarities of your language (do I use brackets or parentheses? Where do the commas go?)**

Remember that there are multiple ways of solving any problem, because there are multiple ways of combinging the tools that are available. As you get more experienced, you'll notice that you'll encounter the same types of problems over and over, and the solutions will be the same. This is a challenge, separating the *structure* of your problem from the *content* of what it represents. Once you master it, programming becomes much easier. 


### Other languages

I encourage you to learn other languages too! [R](https://cran.r-project.org) is my favorite for doing statistics. [Matlab](www.mathworks.com) is great for signal-processing and plotting complex data. [Julia](http://julialang.org/) is a promising new language that combines the best parts of Matlab, Python, and R. 

Don't worry, most of what you learned here will translate easily to other languages. In Matlab, lists are called cell arrays, and dictionaries are called structures. But they're basically the same thing! Things like looping and indexing are pretty universal, and you will encounter them in other languages. As you learn more, you'll notice more similarities 



### Moving Forward

I know you are not experts yet, but you've gotten a taste of the tools that are available to you. Going forward, you will learn about newer tools or more sophisiticated ways of using the existing tools for solving your problems. As I mentioned above, you will get better at recognizing different types of problems, and applying particular solutions to them. 

The goal of this class was to teach you how to read the language and to master a few of the basics. Now you can go out into the world and seek out the help you need, and the solutions you find will not be complete gibberish! Don't worry if you're still feeling confused, that's totally normal. You are all leaving this class knowing more about Python than when you started. You have already succeeded. 

