<div align ="right">Thomas Jefferson University <b>COMP 102</b>: Intro to Scientific Computing</div>

# Additional library examples

Here are a few more fun examples and an assignment to reinforce our knowledge of libraries. 

## the os library

The following examples make use of the `os` library to sort through and report on some data in a datafile. I have stuffed a bunch of irregular folder structure into the 'data' folder. You can click around and see that it would be hard to write code to access all of the data because there are folders and subfolders emedded in hierarchies of different length and etc. 

<a href = 'https://docs.python.org/3/library/os.html'> os library documentation</a>

The code blocks below shows some examples of things we can do with `os`. 

In [None]:
import os

cwd = os.getcwd()   # cwd stands for 'current working directory'
print(cwd)          ## so this handy command will return the path 
                    ## to the directory where your files will be
                    ## written and your project data is stored

print()        
print(type(cwd))


In [None]:
# In this example os.walk() acts on the folder data. At each         
## directory level it creates an object that consists of the 
## path to the directory, a list of directories within that 
## directory, and a list of all the file names in that directory.
        
file_list = []    # make two empty lists           
path_list = []

# each entry in the `walk` object will contain 
## all three pieces of information

for (path, directory, files) in os.walk('data'):
    
    # for every file in the list of files in a directory that doesn't
    ## start with '.' (is not a 'hidden' file)
    
    file = [f for f in files if not f[0] == '.'] 
    
    # add the filename and path of each file found by 'walking' the data 
    
    file_list.append(file)
    path_list.append(path)

file_path_dict = {}
file_path_dict = {path_list[i]: file_list[i] for i in range(len(file_list))}

print(file_path_dict)
    

This output is messy but you can see that it consists of pairs of directory names and file names.If we wanted to refer to each file in a python program, those are the two things we would need to identify or call the specific file. 

In the code below `os.path.join` lets us merge those two things together intelligently - it will make sure we don't have double backslashes or missing backslashes in the final path `//` 

So now we can create a list of the direct path to every file in our folder which is really useful if we are trying to do something like run a program on every file in the directory. Note that by calling from the dictionary using file_name we only get values where there is a file.

In [None]:
file_path_list =[]

for path in file_path_dict:
    for file_name in file_path_dict[path]:
        print(os.path.join(path,file_name))
        file_path_list.append(os.path.join(path,file_name))


Now we can use that information to, for example, get the data out of one of those files as we saw by using the import technique that we learned previously. 

In [None]:
print('Contents of :', file_path_list[9])
print()

openfile = open(file_path_list[9],'r')
contents = openfile.read()
openfile.close
print(contents)

## Questions: 
Looking at the code example above: 1) What does '[]' mean in the printout of the dictionary 'file_path_dict'? 2) what exactly was "joined" by the `os.path.join()` function?
    
* 
* 
* 



## importing data with the csv library

Previously, and in the example above, we imported data using the `open()` function and the `.read()`, and `.close` methods. In our first lesson on importing data, we were able to import data from a .csv file, but our result was just a string of text that contained the data but that had lost all of the data structure. Run the code below for a reminder:

In [None]:
openfile = open("data/Gradebook.csv","r")
grades = openfile.read()
openfile.close
print(grades)
print()
print(type(grades))

The library csv allows us to read or write a .csv file, of course. Looking at the <a href = 'https://docs.python.org/3/library/csv.html?highlight=csv'> documentation of the csv module</a> we see that many of the specific functions have to do with parsing particular formats of data. We are going to look at one particular form of reading in a csv in which our data will be converted to a dictionary format. Later we will see that some of the future data handling libraries we encounter (*pandas* in particular) also contain functions for importing a .csv. 

In [None]:
import csv

grade_dict_list = []

# the line below creates a file object called csvfile (just like
## we created the file object 'openfile' in the example above)
## newline = '' prevents extra rows from being created on import, 
## don't worry about the details of that, just include it when you
## import a .csv file.

with open('data/Gradebook.csv', newline='') as csvfile:         
    reader = csv.DictReader(csvfile)     # function applied to file object
    print(type(csvfile))
    print(type(reader))                  # see types of objects created
    print()
    
    for row in reader:
        print(row)                       # each row is a dictionary
        grade_dict_list.append(row)      # make a list of the dictionaries
        
print()
print(grade_dict_list)                   # print list of dictionaries

#To retrieve the data we can walk through the lists 
    
for entry in grade_dict_list:              # each student is a dictionary                
    if entry['Student'] == 'Marie':        # match key to value
        print('Marie score for exam 3: ' , student['Exam3']) # print value we want
        

## Question: 
What did the dictionary reader function know how to do automatically? Is this more useful than having the data as a string? Why or why not?
    
* 
* 
* 

## the datetime library

The `datetime` library is extremely useful. It allows us to easily handle dates, times, and timezones within our programs. Look at the <a href='https://docs.python.org/3/library/datetime.html'>documentation for datetime</a>. Specifically scroll down to the documentation for *Available Types*. This shows us that datetime works by creating **objects** that represent times, dates, a combination, a difference between a time and a date, or a timzone. 





In [None]:
import datetime as dt         #this is the standard abbreviation  

date1 = dt.date(2022, 9, 17)
date2 = dt.date(year = 2022, month = 10, day = 31)
date3 = dt.date(day = 31, month = 10, year = 2022)
# date4 = dt.date(17,9,2022)

print(date1, '\n', date2,'\n', date3, sep='')
print(type(date1))
print()

# creating time objects

time1 = dt.time(11, 12, 42)
time2 = dt.time(hour = 23, minute = 12, second = 13)
time3 = dt.time(second = 13, minute = 12, hour = 23)
# time4 = dt.time(42, 11, 12)

print(time1, '\n', time2, '\n', time3, sep='')
print(type(time1))
print()

# creating datetime objects

dtime1 = dt.datetime(2022, 9, 17, 11, 12, 42)
dtime2 = dt.datetime(year = 2022, month = 9, day = 17 , hour = 23, minute = 12, second = 13)
dtime3 = dt.datetime(2022, 9, 17)

print(dtime1,'\n', dtime2, '\n', dtime3, sep='' )
print(type(dtime1))
print()

#extracting values from datetime objects

print(dtime1.year)
print(dtime1.minute)
print(type(dtime1.year))

### Question: 
Uncomment (one at a time) date4 and time4. Considering the result, as well as the output of the code without those lines, what have we learned about how the date and time objects can be created and how information is stored within them?
    
* 
* 
* 



A little consideration should allow us to realize that dates and times can't just be treated like ordinary numbers, particularly when it comes to doing time and date math. Times are based on cycles of 60 seconds, 60 minutes, and 24 hours. Calendars are extremely variable, with even the number of days within months varying between different months. 

## Question: 
Answer the following question using a calendar, AND list what you would have to consider to answer the following question programmatically using standard math with no datetime classes:

What date and time is it 3 days, 5 hours and 40 minutes after the following date and time:  February 27, 2024, 11:37 PM?
    
* 
* 
* 


This is a question about time and date math. This is something we might need to do in a computer program. For example, think about a data collection device that generates a time stamp each time data is collected. A user of that data might want to be able to easily calculate the average interval between data collections. 

The example below shows python code that solves the problem above programmatically. 

In [None]:
import datetime as dt

startdate = dt.datetime(2024, 2, 27, 23, 37,0)
print('Starting Date')
print(startdate)
print(type(startdate))
print()

delta = dt.timedelta(days =3, hours =5, minutes =40)

enddate = startdate + delta
print('Ending Date')
print(enddate)
print(type(enddate))
print()

difference = enddate-startdate
print('Delta')
print(difference)
print(type(difference))


## Question: 
Comment the code above explaining what each line does and why it produces the observed results. If helpful, use the documentation for the datetime module. What is timedelta? How does datetime simplify what would be required to program this solution from scratch (that you listed as part of the answer to the previous question)?

* 
* 
* 



## Delorean, an external, specialized library

So far these examples have been drawn from python's built-in libraries, which are important to understand as they provide essential functionality for writing programs covering a wide range of topics. Now we will consider an external library that is designed to extend the functionality of date time.

Delorean is named after the time traveling car in the film 'Back to the Future' and in <a href = 'https://delorean.readthedocs.io/en/latest/'>its documentation</a> is described as 'time travel made easy'. This isn't something that you have to know about, it's just an example that does some fun and useful things. But first we have to install it. Some of the examples included here are lightly modified from the <a href ='https://colab.research.google.com/drive/1vK-7H4cAddK-ScddgzzxMebnvkrVfp_O?usp=sharing#scrollTo=HGLW889fRbxc'>supporting code</a> for a <a href = 'https://www.youtube.com/watch?v=-xSv-czVtys'>talk given by Kimberly Fessel, PhD in 2020</a>, which is itself derived from the Delorean documentation.

Uncomment the code below and run it once, then recomment it so you aren't reinstalling every time the notebook runs:

In [None]:
# pip install delorean

Ok the first thing we are going to do is to create a Delorean object. By defaults we can use the command `Delorean()` to get the current sytem time from your computer. 

Or, we can create a Delorean object from an existing datetime object. 

In the code below we create two different Delorean objects using the two methods, and then print them out. Note that Delorean can also return a standard datetime object by calling a `.datetime` method on the Delorean object. 

In [None]:
from delorean import Delorean

d = Delorean()
patriotic_datetime = dt.datetime(1776, 7, 4, 12, 0, 0)
patriotic_d = Delorean(patriotic_datetime, timezone = 'US/Eastern')

print(d)
print(type(d))
print(d.datetime)
print(type(d.datetime))
print()
print(patriotic_d)
print(patriotic_d.datetime)

Let's take a simplified look at the Delorean object `d`, and then that object shown with the `.datetime` method (as above) and the `.naive` method which excludes time zone information. We can also call any part of the Delorean object by referring to the datetime object wihtin it. 

In [None]:
print(d)
print()
print(d.datetime)
print()
print(d.naive)
print()
print('Year = ', d.datetime.year)

 One of the advantages of Delorean is the `.shift` method which allows time zones to be easily changed. Let's apply it to the object `d` to change it to the US east coast time zone instead of UTC (Coordinated Universal Time), which is the same as Greenwich Mean time. 

In [None]:
d.shift('US/Eastern')

If you are in the Eastern time zone of the US, this should look more like the time you actually ran the code a few blocks above. 

(If you want to see all the timezones available in python, uncomment and run the following code, although you may want to clear the results afterwards.)

The standard datetime module also holds time zone information, it may seem abstract but it might be important if you were writing an application that was handling data from different parts of the world.  

In [None]:
# import pytz             # python time zones module

# pytz.all_timezones      # this will give you a list of 500+ time zones

Delorean also allows for time math that works just like what we saw in datetime. It also permits certain forms of natural language time math to be used. 

In [None]:
print((d + dt.timedelta(weeks = 2)).datetime)

print(d.next_tuesday().datetime)
print(d.next_week().datetime)
print(d.next_month().datetime)
print(d.next_year().datetime)


One very cool feature of Delorean is the `parse()` function, which allows for what is known as natural language processing of dates. 

In [None]:
from delorean import parse

print(parse('november 5th, 1963 '))
print(parse('5th november 1963 12:01 pm'))
print(parse('11/5/63 12 pm'))
print(parse('11/5/63 12 pm', dayfirst=False))

## Question: 
What works and doesn't work in the parsing function? Try some more things that *might* work and share your results below. 

* 
* 
* 



A final fun feature is the `humanize()` method for a Delorean object, which attempts to turn the object into a form that makes sense in normal 'human' language. 

In [None]:
print(d.humanize())
print(patriotic_d.humanize())

## Exercise: 
For practice create a Delorean object that represents the current time. Set it to the correct timezone. Create objects based on this object that represent various points in the past and future. Print the output of `humanize()` applied to each of these new objects to figure out how what kind of output you get for things in the near and distant past and future. 

In [None]:
#
## Your code here
#

![Alt text that will appear on mouseover](images/TJU_logo_dummy_image.png "Dummy image")