# Python Introduction

In this module we'll briefly walk through a handful of tools and techniques in Python — but I don't expect you to remember every last detail! My goal is to give you a lay of the land; think of it as an interactive cheat sheet you can come back to later.

Each rectangular box in the Jupyter interface is called a **cell**. Click inside a cell below to highight it, then click the "play" button at the top of the window to run the code. Or just press **`Shift+Return`**, which does the same thing.

In [2]:
# Text on the right side of a pound sign is "commented out," meaning it won't run as code.
# Let's start by running the same commands we just tried in the Python shell.

1+1

2

In [3]:
print('1337 skillz')

1337 skillz


In [4]:
# Note that if we include multiple expressions in a cell that each generate output, 
# we'll only see output from the last one in the series.

1+1
50-9
200+200

400

In [5]:
# But if we include multiple print() functions, each one will write to a separate line.
# For the sake of clarity, I tend to print all my outputs by default.

print(1+1)
print(50-9)
print(200+200)

2
41
400


In [6]:
# We use the equals sign (with or without spaces on either side) to assign variables in Python.
# These variables then stand in for the values we've given them.

abc = 42
xyz="Hello"

print(abc)
print(xyz)

42
Hello


## *Arithmetic operations*

In [8]:
# Arithmetic syntax is fairly straightforward. In this cell we're working with integers, a.k.a. ints.

int1 = 13
int2 = 7

print(int1+int2)    # addition
print(int1-int2)    # subtraction
print(int1*int2)    # multiplication
print(int1/int2)    # division
print(int1**int2)   # exponentiation

# Note that integer division always rounds down.

20
6
91
1
62748517


In [10]:
# Floating-point numbers, known as floats, are numbers with decimal values. 
# They're a different data type than ints, but the same syntax applies for arithmetic.

float1 = 13.0
float2 = 7.0

print(float1+float2)  # add
print(float1-float2) # subtract
print(float1*float2) # multiply
print(float1/float2) # divide
print(float1**float2) # exponent

20.0
6.0
91.0
1.85714285714
62748517.0


In [20]:
# When we combine floats and ints in arithmetic expressions, the result is always a float.

x = 5
y = 8.0

print(x+x)
print(x+y)

10
13.0


In [19]:
# We can, however, use the int() and float() functions to "cast" ints to floats and vice-versa.
# Note that casting a float to an int always rounds down.

print(int(15.9))
print(float(7))

15
7.0


In [39]:
# We can also cast strings (if they happen to be numbers) to int and float types.

print(int("5"))
print(float("5"))

5
5.0


## *Strings and lists*

In [180]:
# Now try using the "+" operator on two strings.

zyx = 'Hello'

print(zyx + " Jupyter!")

# Note that we've used "+" for two different purposes so far: adding numbers as well as 
# combining, or concatenating, strings. In the parlance of CS, "+" is an "overloaded" operator.

Hello Jupyter!


In [30]:
# In the cell above we enclosed one string in single quotes and put the other in double quotes.
# Either is fine; whichever you choose is up to you. But if your string contains a single quote 
# character, you'll need to enclose it in double quotes.

print("This is a string that's got an apostrophe in it.")

This is a string that's got an apostrophe in it.


In [79]:
# If you're working with a string that contains double quotes and/or line breaks, use triple 
# quotes instead (three single quotes in a row).

print('''Fleas and lice
a horse pissing
next to my pillow''')

# (the best Bashō poem, translated by David Young)

Fleas and lice
a horse pissing
next to my pillow


In [113]:
# The replace() function replaces a character or series of characters with another.

filename = "11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05.mp3"

print(filename.replace("_",", "))

11, OSullivan-Maggie, 10, Lottery-&-Requiem, States-of-Emergency, Rockdrill-11, 05.mp3


In [116]:
# And you can chain replace() together to make multiple replacements.

filename = "11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05.mp3"

print(filename.replace("_",", ").replace(".mp3",""))

11, OSullivan-Maggie, 10, Lottery-&-Requiem, States-of-Emergency, Rockdrill-11, 05


In [127]:
# Use the split() function to divide a string into a list using a specified delimiter.

filename = "11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05.mp3"

fields = filename.split("_")

print(fields)

['11', 'OSullivan-Maggie', '10', 'Lottery-&-Requiem', 'States-of-Emergency', 'Rockdrill-11', '05.mp3']


In [126]:
# join() is the exact opposite of split().

fields = ['11', 'OSullivan-Maggie', '10', 'Lottery-&-Requiem', 'States-of-Emergency', 'Rockdrill-11', '05.mp3']

joined_sentence = " | ".join(fields)

print(joined_sentence)

11 | OSullivan-Maggie | 10 | Lottery-&-Requiem | States-of-Emergency | Rockdrill-11 | 05.mp3


In [120]:
## Can you do a better job of parsing the filename above?





In [121]:
# There are a few ways to represent an ordered sequence of items in Python, but 
# we’ll be using lists most frequently.

eu_countries=['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Republic of Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'UK']

print(eu_countries)

['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Republic of Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'UK']


In [122]:
# We can refer to individual list members using bracket notation. As in most programming 
# languages, we begin counting from 0 when working with ordered data — so list index 3
# is actually the fourth item in the list.

eu_countries[3]

'Croatia'

In [123]:
# If you try to access an out-of-range index value, you’ll get an error.

eu_countries[99]

IndexError: list index out of range

In [42]:
# We can also create a subset of a list using Python’s slice notation.

eu_countries[3:7]

['Croatia', 'Republic of Cyprus', 'Czech Republic', 'Denmark']


In [43]:
eu_countries[6:]

['Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'UK']


In [48]:
eu_countries[:7]

['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Republic of Cyprus', 'Czech Republic', 'Denmark']


In [47]:
eu_countries[-3:]

['Spain', 'Sweden', 'UK']


In [50]:
# If we want to know the length of a list or string, the len() function can tell us.

len(eu_countries)

28

In [132]:
# The "in" operator checks whether one string is a substring of another ...

"green" in "A green hunting cap squeezed the top of the fleshy balloon of a head."

True

In [134]:
# ... and whether a given value is included in a list.

print(37 in [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59])

print("purple" in ["red", "green", "blue"])

True
False


In [152]:
# Appending items to a list, beginning with an empty list

names = []

names.append('Janice')

print(names)

['Janice']


In [153]:
names.append('Jordan')
names.append('Jonathan')
names.append('Julie')
names.append('Jill')

print(names)

['Janice', 'Jordan', 'Jonathan', 'Julie', 'Jill']


In [154]:
# List concatenation

numbers = [99,34,54,23,11,203]

print(names + numbers)

['Janice', 'Jordan', 'Jonathan', 'Julie', 'Jill', 99, 34, 54, 23, 11, 203]


In [155]:
# More list concatenation

names += ['Jim','Janette','Jerry', 'Heather']

print(names)

['Janice', 'Jordan', 'Jonathan', 'Julie', 'Jill', 'Jim', 'Janette', 'Jerry', 'Heather']


In [157]:
# Note that append() is not the same as concatenation with "+"

names.append(numbers)

print(names)

['Janice', 'Jordan', 'Jonathan', 'Julie', 'Jill', 'Jim', 'Janette', 'Jerry', 'Heather', [99, 34, 54, 23, 11, 203], [99, 34, 54, 23, 11, 203]]


## *Comparison and conditionals*

In [64]:
# We can check whether two values are the same with the comparison operator "==", which works 
# for strings as well as numbers.

number = 12
word = "giraffe"

print(number==12)
print(number==13)
print(word == "giraffe")

True
False
True
True


In [60]:
# The "!=" operator asks whether two values are different.

print(number!=12)
print(number!=13)
print(word!="pineapple")

False
True
True


In [66]:
# True and False are known as boolean values, which together comprise their own data type.

print(True==True)
print(True==False)

True
False


In [67]:
# Conditional statements are a fundamental part of all programming languages. 
# We use the "if" operator to make something happen if a boolean expression is equal to True.

number=12

if number==12:
    print("The value is equal to 12.")

The value is equal to 12.


In [68]:
number=11

if number!=12:
    print("The value is not equal to 12.")

The value is not equal to 12.


In [None]:
# By adding "else" below "if," we can tell Python to do something if the boolean expression
# is not equal to True.

number=10

if number==12:
    print("The value is equal to 12.")
else:
    print("The value is not equal to 12.")

## *Loops and functions*

In [70]:
# A "for loop" is a structure that lets us iterate through lists and related data structures 
# so we can do something with each item one at a time.

for country in eu_countries:
    print(country + ' is great.')

Austria is great.
Belgium is great.
Bulgaria is great.
Croatia is great.
Republic of Cyprus is great.
Czech Republic is great.
Denmark is great.
Estonia is great.
Finland is great.
France is great.
Germany is great.
Greece is great.
Hungary is great.
Ireland is great.
Italy is great.
Latvia is great.
Lithuania is great.
Luxembourg is great.
Malta is great.
Netherlands is great.
Poland is great.
Portugal is great.
Romania is great.
Slovakia is great.
Slovenia is great.
Spain is great.
Sweden is great.
UK is great.


In [71]:
# We can create functions to automate processes we want to repeat. Use the "def" declaration 
# to begin a function definition. The code below will produce the same output as the last example.

def is_great(word):
    return word + ' is great.'

for country in eu_countries:
    print(is_great(country))

Austria is great.
Belgium is great.
Bulgaria is great.
Croatia is great.
Republic of Cyprus is great.
Czech Republic is great.
Denmark is great.
Estonia is great.
Finland is great.
France is great.
Germany is great.
Greece is great.
Hungary is great.
Ireland is great.
Italy is great.
Latvia is great.
Lithuania is great.
Luxembourg is great.
Malta is great.
Netherlands is great.
Poland is great.
Portugal is great.
Romania is great.
Slovakia is great.
Slovenia is great.
Spain is great.
Sweden is great.
UK is great.


## *Navigating the file system*

In [72]:
# This cell imports the 'os' package (which stands for "operating system"),
# then prints the current working directory, just like "ls" does in Bash.

import os

os.getcwd()

'/home/sharedfolder/HILT-Audio-ML/Day_1'

In [74]:
# Just like "cd" in Bash, os.chdir() changes the working directory.

os.chdir("/")

os.getcwd()

'/'

In [75]:
# Use os.listdir() with any directory pathname as an argument, and it will return a 
# list of all the files in that directory. As in Bash, "./" refers to the current working directory.

filenames = os.listdir("./")

print(filenames)

['usr', 'var', 'tmp', 'root', 'dev', 'etc', '.dockerenv', 'home', 'opt', 'bin', 'lib', 'run', 'sbin', 'srv', 'mnt', 'lib64', 'proc', 'sys', 'boot', 'media']


In [167]:
# Now let's rename some files. First we'll change our working directory to 
# "sample_audio," then create a list of filenames in the directory.

os.chdir('/home/sharedfolder/sample_audio')

filenames = os.listdir('./')

print(filenames)



In [169]:
# The pprint module, short for "prettyprint," can help make lists more readable.

from pprint import pprint

pprint(filenames)

['01_sine_440.wav',
 '02_CBD-440607_NBC1600-MaryNobleBackstageWife_chime.wav',
 '03_357305__mtg__clarinet-f-major.wav',
 '04_brassproject_patteson.mp3',
 '05_Ravel_Bolero_Andre_Rieu.mp3',
 '06_Amen_Break_-_normal_fast_and_slow_version-qwQLk7NcpO4.wav',
 '07_acoustic-kick.wav',
 '08_Spinee_-_Save_Me-157140751.mp3',
 '10_Creeley_Company_supercut.mp3',
 '11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05.mp3',
 '12_Mi-Kim-Myung_The-Oceans-Held-Up-a-Snarling-Dog_Segue-ZINC_2-20-16.mp3',
 '13_Myles_-_Philly_ICA_-_2010_-_interstitial.mp3',
 '14_CBD-440606_NBC0500-News.annotations.csv',
 '14_CBD-440606_NBC0500-News.mp3',
 '15_square_150.wav']


In [170]:
# Now we'll use os.rename() to replace spaces with underscores in our audio files.

filenames = os.listdir('./')

for filename in filenames:
    os.rename(filename, filename.replace(' ','_'))

pprint(os.listdir('./'))

['01_sine_440.wav',
 '02_CBD-440607_NBC1600-MaryNobleBackstageWife_chime.wav',
 '03_357305__mtg__clarinet-f-major.wav',
 '04_brassproject_patteson.mp3',
 '05_Ravel_Bolero_Andre_Rieu.mp3',
 '06_Amen_Break_-_normal_fast_and_slow_version-qwQLk7NcpO4.wav',
 '07_acoustic-kick.wav',
 '08_Spinee_-_Save_Me-157140751.mp3',
 '10_Creeley_Company_supercut.mp3',
 '11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05.mp3',
 '12_Mi-Kim-Myung_The-Oceans-Held-Up-a-Snarling-Dog_Segue-ZINC_2-20-16.mp3',
 '13_Myles_-_Philly_ICA_-_2010_-_interstitial.mp3',
 '14_CBD-440606_NBC0500-News.annotations.csv',
 '14_CBD-440606_NBC0500-News.mp3',
 '15_square_150.wav']


## *List comprehensions*

In [174]:
# Python's "list comprehension" feature makes it possible to filter and transform 
# lists in a single line of code. Here we create a list of basenames, i.e. filenames  
# with their extnsions removed.

filenames = os.listdir('./')

basenames = [item.replace('.mp3','').replace('.wav','').replace('.csv','') for item in filenames]

pprint(basenames)

['01_sine_440',
 '02_CBD-440607_NBC1600-MaryNobleBackstageWife_chime',
 '03_357305__mtg__clarinet-f-major',
 '04_brassproject_patteson',
 '05_Ravel_Bolero_Andre_Rieu',
 '06_Amen_Break_-_normal_fast_and_slow_version-qwQLk7NcpO4',
 '07_acoustic-kick',
 '08_Spinee_-_Save_Me-157140751',
 '10_Creeley_Company_supercut',
 '11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05',
 '12_Mi-Kim-Myung_The-Oceans-Held-Up-a-Snarling-Dog_Segue-ZINC_2-20-16',
 '13_Myles_-_Philly_ICA_-_2010_-_interstitial',
 '14_CBD-440606_NBC0500-News.annotations',
 '14_CBD-440606_NBC0500-News',
 '15_square_150']


In [None]:
# If our goal is to extract a list of basenames from a large, heterogeneous set 
# of files, how might the method used above lead us astray? 

# Any ideas on how  we might do better?

In [177]:
# We can also use list comprehensions to extract a subset of items based on a
# conditional statement. Here we extract basenames from just the MP3s in our directory.

filenames = os.listdir('./')

mp3_basenames = [item.replace('.mp3','') for item in filenames if '.mp3' in item]

pprint(mp3_basenames)

['04_brassproject_patteson',
 '05_Ravel_Bolero_Andre_Rieu',
 '08_Spinee_-_Save_Me-157140751',
 '10_Creeley_Company_supercut',
 '11_OSullivan-Maggie_10_Lottery-&-Requiem_States-of-Emergency_Rockdrill-11_05',
 '12_Mi-Kim-Myung_The-Oceans-Held-Up-a-Snarling-Dog_Segue-ZINC_2-20-16',
 '13_Myles_-_Philly_ICA_-_2010_-_interstitial',
 '14_CBD-440606_NBC0500-News']


In [None]:
# If we just want to filter items in a list without making any changes, 
# here's what that looks like.

filenames = os.listdir('./')

mp3_filenames = [item for item in filenames if '.mp3' in item]

pprint(mp3_basenames)

In [None]:
# Since MP3 files in real-world collections might end in ".MP3" or ".mp3" or 
# perhaps ".Mp3," using lower() to convert the filename to lowercase in your 
# conditional statement is a good practice.

filenames = os.listdir('./')

mp3_filenames = [item for item in filenames if '.mp3' in item.lower()]

pprint(mp3_basenames)

In [185]:
# We can also convert strings to uppercase and title case like so:

qrst = "The oceans held up a snarling dog"

print(qrst.lower())
print(qrst.upper())
print(qrst.title())

the oceans held up a snarling dog
THE OCEANS HELD UP A SNARLING DOG
The Oceans Held Up A Snarling Dog


## *Reading and writing text files*

In [193]:
# Here's the shortest format for reading a text file and assigning it to a string variable in Python.

text = open("14_CBD-440606_NBC0500-News.annotations.csv").read()

text

"979.696326531,1,469.600362812,David Anderson report on the RAF\n1875.853061224,1,96.391836735,First invasion announcement from Gen. Eisenhower's headquarters\n2208.310566893,1,148.121632653,The Stars and Stripes Forever\n"

In [199]:
# More often, we'll want to import a text file as a list of lines, discarding newline characters.

line_list=open("14_CBD-440606_NBC0500-News.annotations.csv").read().splitlines()

pprint(line_list)

['979.696326531,1,469.600362812,David Anderson report on the RAF',
 "1875.853061224,1,96.391836735,First invasion announcement from Gen. Eisenhower's headquarters",
 '2208.310566893,1,148.121632653,The Stars and Stripes Forever']


In [200]:
# We can write string data to a new text file like so:

with open("text_output_1.txt","w") as fo:
    fo.write("This is the first line.\n")
    fo.write("This is another second line.")

In [201]:
# Or like so, which writes each item in a list to a separate line.

lines = ["This", "is", "a", "list", "of", "lines."]

with open("text_output_2.txt","w") as fo:
    fo.writelines([item+'\n' for item in lines])

## *Executing Bash commands from within Python*

While there are Python packages available for just about any task you can think of, the command-line programs we'd use in Bash are often much simpler to understand. And while Bash has its own powerful scripting syntax, Python scripts are often easier to write and more readable. Fortunately, we can get the best of both world by using Python to execute Bash commands.

In [189]:
# os.system() executes a string argument as if you'd typed it directly into Bash. 
# This is fine for simple commands like this one, which creates a new directory 
# called "new_directory."

dir_name= "new_directory"

os.system('mkdir ' + dir_name)

0

In [187]:
# Recall, however, that Bash interprets every space that isn't escaped with a backslash 
# as a delimiter between arguments; it's possible to work around this contstraint by 
# adding double quotes around everything, but this usually results in jumbled code and 
# often creates unexpected results. Instead of pulling your hair out, you should use 
# subprocess.call() by default, which takes a list of arguments and executes them in 
# Bash while dealing with Bash's quirks.

# As a simple example, let's download some files with wget. We'll start by creating a 
# list of URLs.

# Note the backslash at the end of each line below, which tells Python to disregard the 
# line break. The result is more readable than using a single long line. 

urls = ['http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_04_Discussion2_WPS1_NY_5-22-05.mp3', \
        'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_05_Theres-Traces_WPS1_NY_5-22-05.mp3', \
        'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_06_Physical-Plane_WPS1_NY_5-22-05.mp3']

pprint(urls)

['http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_04_Discussion2_WPS1_NY_5-22-05.mp3',
 'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_05_Theres-Traces_WPS1_NY_5-22-05.mp3',
 'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_06_Physical-Plane_WPS1_NY_5-22-05.mp3']


In [188]:
# Now we'll import the subprocess package and execute our commands using a for loop.

import subprocess

for item in urls:
    subprocess.call(['wget',item])

['wget', 'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_04_Discussion2_WPS1_NY_5-22-05.mp3']
['wget', 'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_05_Theres-Traces_WPS1_NY_5-22-05.mp3']
['wget', 'http://media.sas.upenn.edu/pennsound/authors/Morris/Close-Lstening/Morris-Tracie_06_Physical-Plane_WPS1_NY_5-22-05.mp3']
