# Input and Output in Python
Inputting data into a program can be the most time intensive activity. Python contains a number of packages designed to import and export data with minimum effort and code. I have included some simple examples below. The most popular package (*or at least the most ubiquitous*) these days is `pandas`. (I have included a pandas cheat sheet in the `main` repository). `Pandas` is a more advanced topic that will be covered later. Right now, all you need to know is that `pandas`introduces a new datatype called `dataframes`. They are powerful and flexible tools for data wrangling.

## Details for Navigating File Directories
**Details:** You should note that a single backslash does not work when specifying a file path in Python. You need to use a forward slash or add one more backslash as shown in the code below.
All the functions below will return a `dataframe`, an object in Python that stores data and allows access with a certain syntax that I often refer to as "dot notation".

In [2]:
# This code is used to navigate the file structure of main. You may or may not need to run this.
import os as os
cwdup = os.path.split(os.getcwd())
os.chdir(cwdup[0])
print(cwdup)

('C:\\Users\\tomke\\My Drive\\Programs\\courses\\main', 'Python')


In [4]:
print(os.getcwd())

C:\Users\tomke\My Drive\Programs\courses\main


In [6]:
os.chdir(cwdup[0]+'\\'+cwdup[1])

In [8]:
print(os.getcwd())

C:\Users\tomke\My Drive\Programs\courses\main\Python


In [10]:
txtfile = "data//globaltemps.txt"
csvfile = "data//sunspotsbyyear.csv"
xlsfile = "data//GlobalCarbonBudget2022.xlsx"

In [12]:
txtfile

'data//globaltemps.txt'

# Functions for Reading Files
Python has a number of native functions used to read files. 
## Native `Open`
The function `open` is the simplest and first function to use when opening plain text files. It takes 2 arguments: a string filename and a string option. There are a variety of options you could use, from [geeksforgeeks](https://www.geeksforgeeks.org/open-a-file-in-python/)


| Mode | Description |
|------|:-------------|
| 'r'  | Open text file for reading. Raises an I/O error if the file does not exist. |
| 'r+' | Open the file for reading and writing. Raises an I/O error if the file does not exist. |
| 'w'  | Open the file for writing. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'w+' | Open the file for reading and writing. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'a'  | Open the file for writing. The data being written will be inserted at the end of the file. Creates a new file if it does not exist. |
| 'a+' | Open the file for reading and writing. The data being written will be inserted at the end of the file. Creates a new file if it does not exist. |
| 'rb' | Open the file for reading in binary format. Raises an I/O error if the file does not exist. |
| 'rb+'| Open the file for reading and writing in binary format. Raises an I/O error if the file does not exist. |
| 'wb' | Open the file for writing in binary format. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'wb+'| Open the file for reading and writing in binary format. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'ab' | Open the file for appending in binary format. Inserts data at the end of the file. Creates a new file if it does not exist. |
| 'ab+'| Open the file for reading and appending in binary format. Inserts data at the end of the file. Creates a new file if it does not exist. |




In [36]:
#Creates a file handle to reference
txthandle = open(txtfile, "r")
type(txthandle)

_io.TextIOWrapper

In [34]:
#Get the whole file's content in one string
txthandle.read()

'-0.12\n1856\t-0.25\n1857\t-0.28\n1858\t-0.21\n1859\t-0.09\n1860\t-0.23\n1861\t-0.25\n1862\t-0.3\n1863\t-0.15\n1864\t-0.19\n1865\t-0.13\n1866\t-0.1\n1867\t-0.17\n1868\t-0.14\n1869\t-0.11\n1870\t-0.17\n1871\t-0.17\n1872\t-0.17\n1873\t-0.19\n1874\t-0.21\n1875\t-0.2\n1876\t-0.22\n1877\t0.05\n1878\t0.06\n1879\t-0.12\n1880\t-0.17\n1881\t-0.13\n1882\t-0.13\n1883\t-0.22\n1884\t-0.31\n1885\t-0.32\n1886\t-0.26\n1887\t-0.31\n1888\t-0.17\n1889\t-0.08\n1890\t-0.28\n1891\t-0.19\n1892\t-0.25\n1893\t-0.32\n1894\t-0.23\n1895\t-0.19\n1896\t-0.06\n1897\t-0.07\n1898\t-0.23\n1899\t-0.16\n1900\t-0.08\n1901\t-0.13\n1902\t-0.23\n1903\t-0.32\n1904\t-0.43\n1905\t-0.24\n1906\t-0.21\n1907\t-0.34\n1908\t-0.4\n1909\t-0.44\n1910\t-0.4\n1911\t-0.41\n1912\t-0.33\n1913\t-0.33\n1914\t-0.14\n1915\t-0.1\n1916\t-0.31\n1917\t-0.44\n1918\t-0.29\n1919\t-0.24\n1920\t-0.25\n1921\t-0.18\n1922\t-0.25\n1923\t-0.24\n1924\t-0.23\n1925\t-0.2\n1926\t-0.09\n1927\t-0.18\n1928\t-0.17\n1929\t-0.31\n1930\t-0.11\n1931\t-0.06\n1932\t-0.12\n

In [32]:
#Get the first n characters
txthandle.read(9)

'.11\n1855\t'

In [54]:
#Read line by line
print(txthandle.readline())
print(txthandle.readline())

1852	-0.1

1853	-0.12



## The methods `seek()` and `tell()`
The file handle created by `open` keeps track of where it is in the file. The methods `seek` and `tell` exist to let you control and query the location of the file handle within the file. 

In [48]:
#seek method allows you to move around the file
txthandle.seek(0)

0

In [56]:
txthandle.tell()

47

In [58]:
#Always close your file handle!
txthandle.close()

Great! Can the `open` function open more complicated files? Sure! But beware that even more special characters may appear.

In [68]:
csvhandle = open(csvfile,'r')

In [66]:
csvhandle.read()

''

In [70]:
csvhandle.read(81)

'1700.5,8.3,-1,-1,1\n1701.5,18.3,-1,-1,1\n1702.5,26.7,-1,-1,1\n1703.5,38.3,-1,-1,1\n17'

In [76]:
csvhandle.readline()
csvhandle.readline()


'1701.5,18.3,-1,-1,1\n'

In [78]:
csvhandle.close()

## Numpy
The package `numpy` contains functions for reading files; two are `loadtxt` and `genfrom txt`. 

`loadtxt` reads the file data into a numpy array.

In [82]:
import numpy as np
temps = np.loadtxt(txtfile)
type(temps)

numpy.ndarray

In [84]:
temps[1,1]

-0.09

In [86]:
temps[1][1]

-0.09

The function `genfromtxt` is more functional in that it can return a masked array and use filling values for **missing data**. To read a file, however, you must specify a `delimiter`, the character that separates values in the file.

In [88]:
temp = np.genfromtxt(txtfile,delimiter='\t')
temp

array([[ 1.850e+03, -1.700e-01],
       [ 1.851e+03, -9.000e-02],
       [ 1.852e+03, -1.000e-01],
       [ 1.853e+03, -1.200e-01],
       [ 1.854e+03, -1.100e-01],
       [ 1.855e+03, -1.200e-01],
       [ 1.856e+03, -2.500e-01],
       [ 1.857e+03, -2.800e-01],
       [ 1.858e+03, -2.100e-01],
       [ 1.859e+03, -9.000e-02],
       [ 1.860e+03, -2.300e-01],
       [ 1.861e+03, -2.500e-01],
       [ 1.862e+03, -3.000e-01],
       [ 1.863e+03, -1.500e-01],
       [ 1.864e+03, -1.900e-01],
       [ 1.865e+03, -1.300e-01],
       [ 1.866e+03, -1.000e-01],
       [ 1.867e+03, -1.700e-01],
       [ 1.868e+03, -1.400e-01],
       [ 1.869e+03, -1.100e-01],
       [ 1.870e+03, -1.700e-01],
       [ 1.871e+03, -1.700e-01],
       [ 1.872e+03, -1.700e-01],
       [ 1.873e+03, -1.900e-01],
       [ 1.874e+03, -2.100e-01],
       [ 1.875e+03, -2.000e-01],
       [ 1.876e+03, -2.200e-01],
       [ 1.877e+03,  5.000e-02],
       [ 1.878e+03,  6.000e-02],
       [ 1.879e+03, -1.200e-01],
       [ 1

In [90]:
sunspots = np.genfromtxt(csvfile,delimiter=',')
sunspots

array([[ 1.7005e+03,  8.3000e+00, -1.0000e+00, -1.0000e+00,  1.0000e+00],
       [ 1.7015e+03,  1.8300e+01, -1.0000e+00, -1.0000e+00,  1.0000e+00],
       [ 1.7025e+03,  2.6700e+01, -1.0000e+00, -1.0000e+00,  1.0000e+00],
       ...,
       [ 2.0135e+03,  9.4000e+01,  6.9000e+00,  5.3470e+03,  1.0000e+00],
       [ 2.0145e+03,  1.1330e+02,  8.0000e+00,  5.2730e+03,  1.0000e+00],
       [ 2.0155e+03,  6.9800e+01,  6.4000e+00,  8.9030e+03,  1.0000e+00]])

In [92]:
sunspots = np.genfromtxt(csvfile,delimiter=',', usemask = True, filling_values=np.nan)
sunspots

masked_array(
  data=[[1700.5, 8.3, -1.0, -1.0, 1.0],
        [1701.5, 18.3, -1.0, -1.0, 1.0],
        [1702.5, 26.7, -1.0, -1.0, 1.0],
        [1703.5, 38.3, -1.0, -1.0, 1.0],
        [1704.5, 60.0, -1.0, -1.0, 1.0],
        [1705.5, 96.7, -1.0, -1.0, 1.0],
        [1706.5, 48.3, -1.0, -1.0, 1.0],
        [1707.5, 33.3, -1.0, -1.0, 1.0],
        [1708.5, 16.7, -1.0, -1.0, 1.0],
        [1709.5, 13.3, -1.0, -1.0, 1.0],
        [1710.5, 5.0, -1.0, -1.0, 1.0],
        [1711.5, 0.0, -1.0, -1.0, 1.0],
        [1712.5, 0.0, -1.0, -1.0, 1.0],
        [1713.5, 3.3, -1.0, -1.0, 1.0],
        [1714.5, 18.3, -1.0, -1.0, 1.0],
        [1715.5, 45.0, -1.0, -1.0, 1.0],
        [1716.5, 78.3, -1.0, -1.0, 1.0],
        [1717.5, 105.0, -1.0, -1.0, 1.0],
        [1718.5, 100.0, -1.0, -1.0, 1.0],
        [1719.5, 65.0, -1.0, -1.0, 1.0],
        [1720.5, 46.7, -1.0, -1.0, 1.0],
        [1721.5, 43.3, -1.0, -1.0, 1.0],
        [1722.5, 36.7, -1.0, -1.0, 1.0],
        [1723.5, 18.3, -1.0, -1.0, 1.0],
     

## Common Way with no 'close' necessary
Using the `with` command with `open` allows you a quick and concise way of reading a file without having to close your file handle. Below is an example of the syntax.

In [94]:
with open(txtfile) as glob:
    #print(glob.read())
    temps = glob.read()

In [98]:
temps

'1850\t-0.17\n1851\t-0.09\n1852\t-0.1\n1853\t-0.12\n1854\t-0.11\n1855\t-0.12\n1856\t-0.25\n1857\t-0.28\n1858\t-0.21\n1859\t-0.09\n1860\t-0.23\n1861\t-0.25\n1862\t-0.3\n1863\t-0.15\n1864\t-0.19\n1865\t-0.13\n1866\t-0.1\n1867\t-0.17\n1868\t-0.14\n1869\t-0.11\n1870\t-0.17\n1871\t-0.17\n1872\t-0.17\n1873\t-0.19\n1874\t-0.21\n1875\t-0.2\n1876\t-0.22\n1877\t0.05\n1878\t0.06\n1879\t-0.12\n1880\t-0.17\n1881\t-0.13\n1882\t-0.13\n1883\t-0.22\n1884\t-0.31\n1885\t-0.32\n1886\t-0.26\n1887\t-0.31\n1888\t-0.17\n1889\t-0.08\n1890\t-0.28\n1891\t-0.19\n1892\t-0.25\n1893\t-0.32\n1894\t-0.23\n1895\t-0.19\n1896\t-0.06\n1897\t-0.07\n1898\t-0.23\n1899\t-0.16\n1900\t-0.08\n1901\t-0.13\n1902\t-0.23\n1903\t-0.32\n1904\t-0.43\n1905\t-0.24\n1906\t-0.21\n1907\t-0.34\n1908\t-0.4\n1909\t-0.44\n1910\t-0.4\n1911\t-0.41\n1912\t-0.33\n1913\t-0.33\n1914\t-0.14\n1915\t-0.1\n1916\t-0.31\n1917\t-0.44\n1918\t-0.29\n1919\t-0.24\n1920\t-0.25\n1921\t-0.18\n1922\t-0.25\n1923\t-0.24\n1924\t-0.23\n1925\t-0.2\n1926\t-0.09\n1927\t-0

In [102]:
glob.readline()

ValueError: I/O operation on closed file.

## Pandas
When loading in the package `pandas`, we can use the `read_table()` function to pull data from text file. You could also use the `read_csv()` with `sep= "\t"` to read data from tab-separated file or with `sep=\s+` for space separated values. By default, python will look for a header row unless otherwise specified. `pandas` imports data from the files as `dataframes`. All the functions below will return a `dataframe`, an object in Python that stores data and allows access with a certain syntax that I often refer to as "dot notation".

In [104]:
import pandas as pd
txt1 = pd.read_table(txtfile, header=None,names=['year','temp'])
txt2 = pd.read_csv(txtfile,header=None,sep='\s+')
csv1 = pd.read_csv(csvfile,header=None)
csv2 = pd.read_csv(csvfile,header=None,names = ['year', 'numspots', 'stdev','Nobs','confirmed'])

In [106]:
txt1

Unnamed: 0,year,temp
0,1850,-0.17
1,1851,-0.09
2,1852,-0.10
3,1853,-0.12
4,1854,-0.11
...,...,...
169,2019,0.98
170,2020,1.01
171,2021,0.86
172,2022,0.91


In [108]:
txt2

Unnamed: 0,0,1
0,1850,-0.17
1,1851,-0.09
2,1852,-0.10
3,1853,-0.12
4,1854,-0.11
...,...,...
169,2019,0.98
170,2020,1.01
171,2021,0.86
172,2022,0.91


In [112]:
csv2

Unnamed: 0,year,numspots,stdev,Nobs,confirmed
0,1700.5,8.3,-1.0,-1,1
1,1701.5,18.3,-1.0,-1,1
2,1702.5,26.7,-1.0,-1,1
3,1703.5,38.3,-1.0,-1,1
4,1704.5,60.0,-1.0,-1,1
...,...,...,...,...,...
311,2011.5,80.8,6.7,6077,1
312,2012.5,84.5,6.7,5753,1
313,2013.5,94.0,6.9,5347,1
314,2014.5,113.3,8.0,5273,1


In [114]:
type(csv2)

pandas.core.frame.DataFrame

You can even add column names while loading the file:

If you look at csvfile, you'll see some columns with `-1` as a value; this indicates *missing data*. Classifying your missing data properly will help you avoid accidential using the value in a calculation. You can specify this with another option:

In [116]:
csv3 = pd.read_csv(csvfile,header=None,names = ['year', 'numspots', 'stdev','Nobs','confirmed'],na_values=['-1'])
#csv3.stdev
print(csv3)
csv3['stdev']

       year  numspots  stdev    Nobs  confirmed
0    1700.5       8.3    NaN     NaN          1
1    1701.5      18.3    NaN     NaN          1
2    1702.5      26.7    NaN     NaN          1
3    1703.5      38.3    NaN     NaN          1
4    1704.5      60.0    NaN     NaN          1
..      ...       ...    ...     ...        ...
311  2011.5      80.8    6.7  6077.0          1
312  2012.5      84.5    6.7  5753.0          1
313  2013.5      94.0    6.9  5347.0          1
314  2014.5     113.3    8.0  5273.0          1
315  2015.5      69.8    6.4  8903.0          1

[316 rows x 5 columns]


0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
      ... 
311    6.7
312    6.7
313    6.9
314    8.0
315    6.4
Name: stdev, Length: 316, dtype: float64

### Reading Excel files with Pandas
Python will read excel files in the same manner. You can specify sheets and column/row in which to import. 

In [118]:
xcel1 = pd.read_excel(xlsfile,sheet_name="Global Carbon Budget", skiprows=20,header=0)

In [120]:
print(xcel1)
print(type(xcel1))
xcel1['Year'][2]

    Year  fossil emissions excluding carbonation  land-use change emissions  \
0   1959                                2.417091                   1.938933   
1   1960                                2.562137                   1.792600   
2   1961                                2.570540                   1.666500   
3   1962                                2.661315                   1.608267   
4   1963                                2.803399                   1.542733   
..   ...                                     ...                        ...   
58  2017                                9.851730                   1.182300   
59  2018                               10.050902                   1.141200   
60  2019                               10.120786                   1.243800   
61  2020                                9.624478                   1.107467   
62  2021                               10.132055                   1.075067   

    atmospheric growth  ocean sink  land sink  ceme

1961

## Writing to a Plain Text file
Python contains builtin functions to output information to a plain text file. It involves opening a file to write to, writing to that file, and then closing said file. 

In [124]:
txt1

Unnamed: 0,year,temp
0,1850,-0.17
1,1851,-0.09
2,1852,-0.10
3,1853,-0.12
4,1854,-0.11
...,...,...
169,2019,0.98
170,2020,1.01
171,2021,0.86
172,2022,0.91


In [134]:
#txt1 = txt1.to_string()
f= open("test1.txt","w+")
f.write(temps)
f.write('\n')
f.close()

You may also append a file (add to it without overwriting).

In [132]:
f= open("test1.txt","a+")
f.write(temps)
f.write('\n')
f.close()

## Writing to a CSV file
Python does have allow you to read to a `csv` file. The details of which are a bit uninformative at this point. But if that is something you want to do, please look into the package `csv`. 

## Writing to an Excel file
`pandas` contains functionality that allows you to write data to a Microsoft Excel file. The method `to_excel` allows you to write data to sheets within the excel spreadsheet, but you will need to wrap the function `ExcelWriter` around `to_excel` in order to write to multiple sheets within the same file. The 'ExcelWriter' function does require more syntax as shown below:  

In [136]:
#must specify a file, you can specify a sheet
csv2.to_excel("sunspotsout.xlsx",sheet_name='sun')
xcel1.to_excel("sunspotsout.xlsx",sheet_name='carbon accounting')

In [138]:
with pd.ExcelWriter('pandas_simple.xlsx') as writer:
    csv2.to_excel(writer, sheet_name='sun')
    xcel1.to_excel(writer, sheet_name='carbon')