lwasser · lwasser · Apr 12, 2015 · Apr 10, 2015 · Apr 11, 2015 · Mar 30, 2015
diff --git a/00-short-introduction-to-Python.md b/00-short-introduction-to-Python.md
@@ -113,7 +113,7 @@ elements. Each element can be accessed by its index:
 
 ```python
 >>> numbers = [1,2,3]
->>> l[0]
+>>> numbers[0]
 1
 ```
 
@@ -181,7 +181,7 @@ AList = [1,2,3]
 
 ### Challenge
 1. What happens when you type `ATuple[2]=5` vs `AList[1]=5` ?
-2. Type `type(ATuple)` into python - what is the object type? 
+2. Type `type(ATuple)` into python - what is the object type?
 
 
 ## Dictionaries

diff --git a/01-starting-with-data.md b/01-starting-with-data.md
@@ -44,15 +44,15 @@ Each time we call a function that's in a library, we use the syntax
 `LibraryName.FunctionName`. Adding the library name with a `.` before the
 function name tells python where to find the function. In the example above, we
 have imported pandas as `pd`. This means we don't have to type out `pandas` each
-time we call a pandas function. 
+time we call a pandas function.
 
 
 ## Lesson Overview
 
 We are studying the species and weight of animals caught in plots in a study
 area. The data sets are stored in .csv (comma separated value) format. Within
 the `.csv` files, each row holds information for a single animal, and the
-columns represent: record_id, month, day, year, plot, species, sex, wgt. 
+columns represent: record_id, month, day, year, plot, species, sex, wgt.
 
 The first few rows of our first file look like this:
 
@@ -86,17 +86,17 @@ and they can replicate the same analysis.
 # Reading Data Using Pandas CSV
 We will begin by locating and reading our survey data which are in CSV format.
 We can use Pandas `read_csv` function to pull the file directly into a
-[DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe). 
+[DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe).
 
 ## So What's a DataFrame?
 
 A DataFrame is a 2-dimensional data structure that can store data of different
 types (including characters, integers, floating point values, factors and more)
 in columns. It is similar to spreadsheets or SQL tables or the `data.frame` in
-R. 
+R.
 
 First, let's make sure the python Pandas library is loaded. We will import
-Pandas using the nickname `pd`. 
+Pandas using the nickname `pd`.
 
 ```python
 import pandas as pd
@@ -109,8 +109,8 @@ IPython Notebook, be sure to start the notebook in the workshop repository.
 If you didn't do that you can always set the working directory using the code
 below.
 
-```python	
-import OS
+```python
+import os
 os.getcwd()
 # if this directory isn't right, use the command below to set the working directory
 os.chdir("YOURPathHere")
@@ -147,9 +147,9 @@ columns. It looks like  the `read_csv` function in Pandas read our file
 properly. However, we haven't saved any data to memory so we can work with it.
 We need to assign the DataFrame to a variable. Remember that a variable is a
 name for a value, such as `x`, or  `data`. We can create a new object with a
-variable name by assigning a value to it using `=`. 
+variable name by assigning a value to it using `=`.
 
-Let's call the imported survey data `surveys_df`: 
+Let's call the imported survey data `surveys_df`:
 
 ```python
 surveys_df = pd.read_csv("data/surveys.csv")
@@ -191,7 +191,7 @@ which returns:
 Now we can start manipulating our data! First, let's check data type of object
 that `surveys_df` is using the `type` command. The `type` function and
 `__class__` attribute tell us that `surveys_df` is `<class
-'pandas.core.frame.DataFrame'>` in Python. 
+'pandas.core.frame.DataFrame'>` in Python.
 
 ```python
 type(surveys_df)
@@ -201,7 +201,7 @@ surveys_df.__class__
 We can also use the `surveys_df.dtypes` command to view the data type for each
 column in our dataframe. Int64 represents numeric integer values - int64 cells
 can not store decimals. Object represents strings (letters and numbers). Float64
-represents numbers with decimals.  
+represents numbers with decimals.
 
 	surveys_df.dtypes
 
@@ -218,23 +218,23 @@ sex           object
 wgt          float64
 dtype: object
 ```
-We'll talk a bit more about what the different formats mean in a different lesson. 
+We'll talk a bit more about what the different formats mean in a different lesson.
 
 ### Useful Ways to View DataFrame objects in Python
 
 There are multiple methods that can be used to summarize and access the data
 stored in dataframes. Let's try out a few. Note that we call the method by using
 the object name `surveys_df.method`. So `surveys_df.columns` provides an index
-of all of the column names in our DataFrame. 
+of all of the column names in our DataFrame.
 
 ## Challenges
 Try out the methods below to see what they return.
 
-1. `surveys_df.columns`  
+1. `surveys_df.columns`
 2. `surveys_df.head()`. Also, what does `surveys_df.head(15)` do?
-3. `surveys_df.tail()`.   
-4. `surveys_df.shape` - Take note of the output of the shape method. What format does it return the shape of the DataFrame in? 
- 
+3. `surveys_df.tail()`.
+4. `surveys_df.shape` - Take note of the output of the shape method. What format does it return the shape of the DataFrame in?
+
 HINT: [More on tuples, here](https://docs.python.org/2/tutorial/datastructures.html#tuples-and-sequences).
 
 
@@ -286,7 +286,7 @@ which **returns**:
 
 We often want to calculate summary statistics grouped by subsets or attributes
 within fields of our data. For example, we might want to calculate the average
-weight of all individuals per plot. 
+weight of all individuals per plot.
 
 We can also extract basic statistics for all rows in a column, individually
 using the syntax below:
@@ -320,7 +320,7 @@ surveys_df['wgt'].count()
 
 But if we want to summarize by one or more variables, for example  sex, we can
 use the `.groupby` method in Pandas. Once we've created a groupby DataFrame, we
-can quickly calculate summary statistics by a group of our choice. 
+can quickly calculate summary statistics by a group of our choice.
 
 ```python
 # Group data by sex
@@ -337,13 +337,13 @@ numeric data.
 sorted.describe()
 # provide the mean for each numeric column by sex
 sorted.mean()
-```	
+```
 
 `sorted.mean()` **OUTPUT:**
 
 ```python
         record_id        day         year       plot        wgt
-sex                                                            
+sex
 F    18036.412046  16.007138  1990.644997  11.440854  42.170555
 M    17754.835601  16.184286  1990.480401  11.098282  42.995379
 P    22488.000000  21.000000  1995.000000   8.000000  13.000000
@@ -362,19 +362,19 @@ is important to explore your data, before diving into analysis too quickly.
 # Challenge
 
 1. Have a look at the output of the `describe` method below. How many columns
-   represent the sex "Z", "P" or "R". 
+   represent the sex "Z", "P" or "R".
 2. What happens when you group by two columns using the syntax and then grab
-   mean values: 
-	`sorted2 = surveys_df.groupby(['plot','sex'])`
-	sorted.mean()
+   mean values:
+	- `sorted2 = surveys_df.groupby(['plot','sex'])`
+	- `sorted2.mean()`
 3. Summarize weight values for each plot in your data. HINT: you can use the
    following syntax only create summary statistics for one column in your data
-   `byPlot['wgt'].describe()`	
+   `byPlot['wgt'].describe()`
 
 
 Did you get #3 right? **A Snippet of the Output from challenge 3 looks like:**
 
-	plot       
+	plot
 	1     count    1903.000000
 	      mean       51.822911
 	      std        38.176670
@@ -387,7 +387,7 @@ Did you get #3 right? **A Snippet of the Output from challenge 3 looks like:**
 ## Quickly Creating Summary Counts in Pandas
 
 Let's next create a list of unique species in our data. We can do this in a few
-ways. 
+ways.
 
 1. We can use the `nunique()` method which return a pandas SERIES of unique
    elements in the data and a n associated count of how many rows contain that
@@ -397,7 +397,7 @@ ways.
 species_list = surveys_df.record_id.groupby(surveys_df.species).nunique()
 ```
 
-Or, we can also count just the rows that have the species "DO":  
+Or, we can also count just the rows that have the species "DO":
 
 ```python
 surveys_df.record_id.groupby(surveys_df['species']).nunique()['DO']
@@ -446,7 +446,7 @@ total_count.plot(kind='bar');
 
 1. Create a plot of average weight across all species per plot.
 2. Create a plot of total males versus total females for the entire dataset.
-3. Create a stacked bar plot that has male vs female for each plot. 
+3. Create a stacked bar plot that has male vs female for each plot.
 
 
 
@@ -464,16 +464,16 @@ being sex. Here is some data to help you solve this challenge:
 
 ```
  	        wgt
-sex 	    	F   			M 
-plot 					
-1 	 	46.311138 	55.950560 	
-2 	 	52.561845 	51.391382 	
+sex 	    	F   			M
+plot
+1 	 	46.311138 	55.950560
+2 	 	52.561845 	51.391382
 
 data.plot(kind='bar',stacked=True,title="Total Weight by Plot and Sex")
 my_plot.set_xlabel("Plot")
 my_plot.set_ylabel("Weight")
 ```
-	
+
 
 * You can use the command `unstack` to transform grouped data into columns for
   each plotting. Try running `surveys_df.unstack' and see what it yields.

diff --git a/08-working-with-sql.md b/08-working-with-sql.md
@@ -59,7 +59,7 @@ con.close()
 
 Storing your data in an SQLite database can provide substantial performance
 improvements when reading/writing compared to CSV. The difference in performance
-when becomes more noticable as the size of the dataset grows (see for example
-[these benchmarks]).
+becomes more noticable as the size of the dataset grows (see for example [these
+benchmarks]).
 
 [these benchmarks]: http://sebastianraschka.com/Articles/sqlite3_database.html#benchmarks