<div class="alert alert-block alert-info">
<h2>Importing and Exporting Data Files with NumPy</h2><br>

</div>

In [1]:
# Once again we need to import the Numpy module into JN for this coding session to access its methods.

import numpy as np

In [2]:
# Import the csv ('comma-separated values') file called 'winter.csv'.
# You should have uploaded the csv file into your JN directory.
# The Numpy method to import structured data is genfromtxt().
# Once the file is defined and open for reading, genfromtxt splits each non-empty line into a sequence of strings. 
# Empty or commented lines are just skipped. 
# The delimiter argument is used to define how the splitting should take place.
# Comma-separated files (CSV) use a comma (,) to mark the separation between columns.

npWinter1 = np.genfromtxt("winter.csv", delimiter =",")

# The alternative method is loadtxt(), which is faster
# But, unlike genfromtxt(), it does not take into account missing values.

In [3]:
# Let's print the imported file to see it in the output.

print(npWinter1)

[[   nan    nan    nan    nan]
 [2017.    12.4   14.3   16.6]
 [2018.    16.3   22.2   20.1]]


In [4]:
# The csv file contains mixed data -  both textual and numeric. 
# As you can see in the output above, the textual data has been replaced with 'NaN'.
# We can either remove the rows with textual data OR we can force all data to be displayed in the output.
# If we are confident that the textual data, especially in the header, is not essential, we can remove it:

npWinter2 = np.genfromtxt("winter.csv", delimiter=",", skip_header=1)

print(npWinter2)

[[2017.    12.4   14.3   16.6]
 [2018.    16.3   22.2   20.1]]


In [5]:
# To print out all mixed data, you may need to set the dtype argument to None as below.
# dtype set to None infers data type from the values it finds in the file, 
# which is useful when files contain mixed type of data.

npWinter3 =  np.genfromtxt("winter.csv", delimiter=",", dtype=None, encoding=None)
print(npWinter3)

# The output may contain 'b' before each data value. 
# np.genfromtxt() operates in byte mode, which is the default string type in Python 2. 
# But Python 3 uses unicode, and marks bytestrings with this character 'b', unless we set the encoding arg to None.

[['Year' 'Dec' 'Jan' 'Feb']
 ['2017' '12.4' '14.3' '16.6']
 ['2018' '16.3' '22.2' '20.1']]


In [6]:
# We can slice off a piece of numpy arrays by index position as we did it with lists and tuples.
# If we set the slicer to [2], it will return the third row.

print(npWinter3[2])

['2018' '16.3' '22.2' '20.1']


In [7]:
# We may save the Numpy array as csv or text files.
# fmt argument specifies formatting options. By default numbers will be stored in float format.
# %d value will store numbers as integers, while %s saves them as strings.

np.savetxt("winter3.csv", npWinter3, delimiter=",", fmt='%s')

# Alternatively, use winter3.tofile() method to export data from the Python environment to an external file.
# Yet this method may cause various formatting problems.

# More on saving arrays in external files see: 
# https://thispointer.com/how-to-save-numpy-array-to-a-csv-file-using-numpy-savetxt-in-python/

<div class="alert alert-block alert-info">
    <h1>Importing and Exporting Data Files with pandas</h1>
    <p></p>
 </div>

In [8]:
# With the pandas module, we may read the same cvs file as a DataFrame into our working environment.
# Assuming that you have already installed pandas on your machines, 
# import pandas under the alias 'pd' for this coding session

import pandas as pd


In [9]:
# Create a variable pdWinter to store the csv file as a DataFrame.

pdWinter = pd.read_csv("winter.csv", sep=",")

# Print out the output of your variable. 
# As you can see, pandas and its DataFrames cope nicely with outputting mixed data.

pdWinter

Unnamed: 0,Year,Dec,Jan,Feb
0,2017,12.4,14.3,16.6
1,2018,16.3,22.2,20.1


In [10]:
# Add a new row to pdWinter to include the forecast for winter 2019.
# First we need to define the data for the new row we want to add. This is where dictionaries come in handy.

row = pd.Series({'Year':2019,'Dec':9,'Jan':7,'Feb':11}, name=2)

# Then use the append() method to add the row variable to pdWinter.

Winter2017_2019 = pdWinter.append(row)

Winter2017_2019

Unnamed: 0,Year,Dec,Jan,Feb
0,2017,12.4,14.3,16.6
1,2018,16.3,22.2,20.1
2,2019,9.0,7.0,11.0


In [11]:
# We can return the values by column names

Winter2017_2019['Dec']

0    12.4
1    16.3
2     9.0
Name: Dec, dtype: float64

In [12]:
# We can also add new columns to the existing data in DataFrames. This time we want to add temperatures for spring months.
# We  can use pandas-specific method assign() and lists.

Winter_Spring = pdWinter.assign(March=[15.3, 16.7], April=[17.1, 17.5], May=[24.2, 20])

Winter_Spring

Unnamed: 0,Year,Dec,Jan,Feb,March,April,May
0,2017,12.4,14.3,16.6,15.3,17.1,24.2
1,2018,16.3,22.2,20.1,16.7,17.5,20.0


In [13]:
# We can finally export the expanded data from the 'Winter_Spring' DataFrame to a csv file.
# The argument index=False will prevent from exporting the index column into an external file.

Winter_Spring.to_csv("winter_spring.csv", index=False)

In [14]:
# We can convert pandas' DataFrames to Numpy arrays for further statistical manipulation:

npTemps = np.array(Winter_Spring)

print(npTemps)

# In the 2nd part of this class, we did the opposite: we converted a Numpy array into a pandas' DataFrame

[[2017.    12.4   14.3   16.6   15.3   17.1   24.2]
 [2018.    16.3   22.2   20.1   16.7   17.5   20. ]]
