## Simple pandas demo, Part 1

#### In this demo we will:
1. Access CSV files from a local directory
2. Report the data and columns
3. Return some simple statistics from a column with describe()
4. Calculate the average of 3 columns' data, return into new field
5. Save the updated CSV as a new file


Created by Elizabeth Tulanowski, GIS Instructor + Geospatial Centroid Education Coordinator, Colorado State University


#### 1. Access a CSV file using pandas

In [2]:
# import libraries
import pandas as pd, os

# paths to data
path = r"C:\Student"
file = r"PrecipGaugeData.csv"
data = os.path.join(path,file)

# Read the CSV
df = pd.read_csv(os.path.join(path,file))

print (f"Data frame for {data} has been created.")

Data frame for C:\Student\PrecipGaugeData.csv has been created.


#### 2. Report the data and columns


In [19]:
# print first 5 rows
print (df.head())


# List the column names (https://www.geeksforgeeks.org/how-to-get-column-names-in-pandas-dataframe/)
for col in df.columns:
    print(col)


# or (not as nicely formatted...)
print (list(df.columns))
print()

   Station  UTM_N_83  UTM_E_83  2017_Session1  2017_Session2  2017_Session3  \
0        1   4490823    449749          23.99           0.94          38.29   
1        2   4490711    449820          14.40           0.54          29.82   
2        3   4490701    449981          28.50           0.44          35.45   
3        4   4490817    450010          22.63           1.04          33.07   
4        5   4490897    450077          26.71           1.45          31.49   

   AvgPrecip  
0  21.073333  
1  14.920000  
2  21.463333  
3  18.913333  
4  19.883333  
Station
UTM_N_83
UTM_E_83
2017_Session1
2017_Session2
2017_Session3
AvgPrecip
['Station', 'UTM_N_83', 'UTM_E_83', '2017_Session1', '2017_Session2', '2017_Session3', 'AvgPrecip']



#### Need to delete a column?<br> 
Try the drop function. Samples here:
    
https://www.activestate.com/resources/quick-reads/how-to-delete-a-column-row-from-a-dataframe/

#### 3. Return some simple statistics from a column

In [11]:
# Get the average precipitation for Session 1 column
# simple
print (str(df.mean()["2017_Session1"]))



25.802
35.19
14.4


In [12]:
# Get the average precipitation for Session 1 column
# formatted and easier to update

sess = "3"
field = "2017_Session"+sess

print (f"The mean precipitation for Session {sess} is: " + str(round(df.mean()[field], 2 ))  + " mm")

print  (f"Max precip for Session {sess}: " + str(df.max()[field]) + " mm")
print  (f"Max precip for Session {sess}: " + str(df.min()[field]) + " mm")

The mean precipitation for Session 3 is: 32.37 mm
Max precip for Session 3: 39.71 mm
Max precip for Session 3: 25.29 mm


In [3]:
# Retrieve stats for all columns:
df.describe()

Unnamed: 0,Station,UTM_N_83,UTM_E_83,2017_Session1,2017_Session2,2017_Session3
count,20.0,20.0,20.0,20.0,20.0,20.0
mean,10.5,4490937.0,449895.2,25.802,1.224,32.371
std,5.91608,123.1828,204.373137,4.278596,0.465261,3.485165
min,1.0,4490701.0,449565.0,14.4,0.44,25.29
25%,5.75,4490867.0,449731.75,23.88,0.9375,30.7325
50%,10.5,4490938.0,449904.5,25.93,1.125,31.465
75%,15.25,4491019.0,450047.5,27.36,1.545,33.93
max,20.0,4491142.0,450240.0,35.19,1.98,39.71


In [10]:
# Retrieve stats for just one column:
df["2017_Session1"].describe()

# Field name gets passed in as a string in [ ] 


count    20.000000
mean     25.802000
std       4.278596
min      14.400000
25%      23.880000
50%      25.930000
75%      27.360000
max      35.190000
Name: 2017_Session1, dtype: float64

#### 4. Calculate the average of 3 columns' data, return into new field

In [13]:
# Calculate averages into a new field

# Resources:
# https://www.geeksforgeeks.org/create-a-new-column-in-pandas-dataframe-based-on-the-existing-columns/
# https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html

# Create some variables for field names, using two methods
f1 = list(df.columns)[3]
f2 ="2017_Session2"
f3 ="2017_Session3"

try:
    print ("Calculating averages.....")
    df["AvgPrecip"] = (df[f1] + df[f2] + df[f3])/ 3
    print ("New values inserted into table")
except Exception as e:
    print("Error: " + e.args[0])

Calculating averages.....
New values inserted into table


#### 5. Check that the table updated and save

In [20]:
#Do we have a new column?
print ("\nThe columns are now:")
for col in df.columns:
    print(col)

# Or check it with an if statement
if "AvgPrecip" in df.columns:
    print ("\nField was created")
else:
    print ("\nField was not created")


    
# Return data again with df.head()
print ()
print(df.head(10))

# All this is done in memory in the data frame. It must be exported to a new CSV.

newfile = file[:-4]+ "_updated.csv"
print ("The updated table is saved as "+ newfile)
#df.to_csv(os.path.join(path,newfile))


The columns are now:
Station
UTM_N_83
UTM_E_83
2017_Session1
2017_Session2
2017_Session3
AvgPrecip

Field was created

   Station  UTM_N_83  UTM_E_83  2017_Session1  2017_Session2  2017_Session3  \
0        1   4490823    449749          23.99           0.94          38.29   
1        2   4490711    449820          14.40           0.54          29.82   
2        3   4490701    449981          28.50           0.44          35.45   
3        4   4490817    450010          22.63           1.04          33.07   
4        5   4490897    450077          26.71           1.45          31.49   
5        6   4490985    450041          26.98           1.00          32.53   
6        7   4491023    450116          25.92           1.83          29.26   
7        8   4491030    450240          23.55           0.92          27.86   
8        9   4491142    450217          24.80           0.63          31.22   
9       10   4491133    450067          32.03           1.41          30.95   

   AvgPrec

### Pandas resources:
There is no shortage of resources and sample code out there. Here are a few to get you going:
* pandas documentation: https://pandas.pydata.org/docs/index.html
* Videos: https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS
* Tutorials and guides: https://www.w3resource.com/pandas/index.php

### The end.  Thanks for watching, and happy coding!