<img align="right" width="300" src="https://data-services.hosting.nyu.edu/assets/libraries_short_color.png" alt="NYU Libraries Logo">

# Intro to Python

**Nicholas Wolf**<br/>

[NYU Data Services](https://guides.nyu.edu/dataservices)

[ORCID 0000-0001-5512-6151](https://orcid.org/0000-0001-5512-6151)

Why Python?

 - Multi- mult-use language (data munging, data analysis, application programming, server-side web programming, big data, little data)
 - Using scripts differentiates input data from output calculations; makes visible workflows; allows for infinitely adjustable data process steps; prevents "hiding" calculations (as in an Excel function)
 - Extensible for any analytical step desired, constraints are minimal
 - Open source, no cost
 - Bettet for scale, e.g. size limitations are only those of computational environment, not software


## Python Environment

Programming in Python can look like many things. Often times, it is interacted with as a command line interface (sometimes also referred to as a "console", a "Python shell", a "REPL"). Other times, people use it in an iPython or Jupyter notebook (that's what this document is), which allows users to selectively execute written code.

There are also other more sophisticated python development environments, such as [PyCharm](https://upload.wikimedia.org/wikipedia/commons/0/05/PyCharm_4.5.1.png)

In general, a Python environment listens for user input, evaluates that input, and returns whatever the result is.



## Jupyter Notebooks

Here, we use one kind of environment, Jupyter Notebook, a web-based interface built around code cells (= code blocks) that can be run independently but which preserve all code outcomes as though one long script.

Great for:

 - Fine-tuning and re-running workflows that transform or analyze data
 - Producing code that "tells a story"
 - Presenting code with images, multimedia, and visualizations for easier readabilty by an audience
 - Porting long lines of code in one file to other users for re-running (good for sharing code)
 
Examples

 - [Reinhart-Rogoff Replication](https://nbviewer.jupyter.org/github/vincentarelbundock/Reinhart-Rogoff/blob/master/reinhart-rogoff.ipynb)
 - [Simple Pandas example re: calculating one's earnings][(https://nbviewer.jupyter.org/github/Tanu-N-Prabhu/Python/blob/master/Manipulating_the_data_with_Pandas_using_Python.ipynb)
 - [Advanced example showing Pandsa and visuals](https://nbviewer.jupyter.org/gist/twiecki/3962843)
 

Try running the code blocks below (click on one and press Shift-Enter). The line that appears below the block is what is returned as a result of running that code.

In [None]:
print("Hello world!")

In [None]:
## Addition

3 + 4 

In [None]:
## An exponent operation 

5**2

In [None]:
## Decimal/"floating point" division

5/6

In [None]:
## Integer division (note the difference compared to above!)

5//6

In [None]:
## Boolean

True 

In [None]:
## Negating a Boolean

not False

Our inputs to Python can roughly be categorized as either **commands** (i.e., requests to do something), or **definitions**

In [None]:
## Two commands

print(4 + 7)

In [None]:
## Defining a variable

my_location = "New York University"

In [None]:
## Combining them!

print(my_location)

Side note: Jupyter notebooks automatically print the last result:

In [None]:
4+5
2+3

## Introduction to Data Types


Let's focus on **variable definitions** for a moment.

One of the most common things you will be doing is defining the data
that you will be using in the course of programming

In [None]:
## Whenever we want to use data, we declare and assign it to a "variable", in this case, my_city

my_city = "New York"

In [None]:
current_floor = 6

Above, we've defined two variables. The "single equals" sign performs an assignment operation. One way to think of this is:


> "The variable 'current_floor' *gets* the value 6".



Notice that the actual data (right-side of the equals sign) is being highlighted in different colors across the two variables. This is happening because one variable contains text, or **string** data, while the other contains a number (an **integer**)

These are two of the essential "data types".


In [None]:
## For instance, say we are currently on floor 6 of this building, but
## we know that after this tutorial is finished, there's a book we want
## to check out, and it's on the ninth floor.

destination_floor = 9

In [None]:
## We can figure out how many floors we'll have to travel to get there

floors_I_need_to_walk = destination_floor - current_floor

In [None]:
print(floors_I_need_to_walk)

Using variables allows us to write generalized code that can serve as a formula, even as the specific values saved in these variables change.

### List of data types


*   string : `"Hello there"`
*   boolean (true/false) : `True`
*   integer : `4`
*   decimal (float) : `4.0`

Different operations are defined for each data type. For instance, you can *negate* any boolean value (turn a `True` into a `False`, or vice versa); you can *concatenate* strings together; you can *add* or *subtract* integers and decimals.

You can check the type of any data using the `type()` function









In [None]:
type(4)

In [None]:
type("Hello")

In [None]:
type(floors_I_need_to_walk)

### Reserved Keywords


In Python, some words are "reserved" and cannot be used as names for variables or functions (identifiers).

For instance, the word `True`, when it does not appear within quotation marks, will always evaluate to the _value_ **True** (and similar for `False`). Other reserved tokens are those that comprise the actual language, such as `if`, `else`, `return`, and so on.

https://www.programiz.com/python-programming/keyword-list


In [None]:
True

In [None]:
var1 = 10
var2 = "Hey!"
True = 3.4 ## Doesn't work!


In [None]:
## This, however, is completely fine...

var3 = True 
var3

# Conditional Statements


Conditional if-statements are a very big part of programing with any language. We can think about it first using some pseudocode:

```
if ( it is raining outside ):
  grab an umbrella before leaving
else:
  leave the umbrella at home
```

Conditional statements are a way to specify sections of code that should be executed only when a condition is true. You can think of the if and else lines as being gatekeepers to the code that exists right below them.


In [None]:
## Only when the condition between the parentheses evaluates to True do we
## evaluate the associated code in the body

iHaveMoney = False

if iHaveMoney:
    print("I will buy ice cream!")
else:
    print("I can't buy ice cream.")

In [None]:
friendHasMoney = False

if iHaveMoney or friendHasMoney:
    print("We will buy ice cream!")
else:
    print("We can't buy ice cream.")

Multiple conditional statements can be written between the paranthesis, joined by the words "or" and "and". These words are logical operators in Python.

As long as the statement within the paranthesis evaluates to a logical value (i.e., either **True** or **False**), it can be used within an if-statement.

In [None]:
 3 > 5 and 3 > 1 

In [None]:
 3 > 5 or 3 > 1 

Statements on *both* sides of an "and" must evaluate to `True` for the interpreter to enter the body of code following it.

### Challenge 1

```python
if True or False:
  print("A")
  
if True and False:
  print("B")
  
```

What will be printed as a result of running the above code block?


1.    A

      B
      
2.    A
3.    B
4.    


In [None]:
## Try it out!

if True or False:
    print("A")
    
if True and False:
    print("B")

In [None]:
## Testing equality

print( 3 == (4 - 1) )

## use the double equals to test equality, since single equals
## is only used for assigning values

print( 3 != (4 - 1) )

## the != operator means 'not equal'

In [None]:
hour = 10
if hour > 7 and hour <= 12:
    print("Good morning!")
elif hour > 12 and hour <= 18:
    print("Good afternoon!")
elif hour > 18 and hour <= 22:
    print("Good evening!")
else:
    print("You're too late or too early!")


### Nesting if-statements

If-statements can be arbitrarily nested in your code:

In [None]:
## For this example, we'll import a function that generates random values
from random import *

## Here we use the random number generator to find some value
## between 0 and 1
random_number = random()
print("My random number is: " + str(random_number))

In [None]:
if random_number < 0.5:
    print("The number is less than .5")
    if random_number < 0.25:
        print("... in fact, it's even less than .25")
else:
    print("The number is greater than (or equal to) .5")

[Click here for an illustration](https://docs.google.com/drawings/d/1ql_x12TyNqkylYqe_tFrqJwbPGgC49qZ8I5wbv2eOtg/edit) of the logic in the if-statement above.

What is the difference between Else-If and Nested If statements?

# Essential Data Structures

## Lists

In [None]:
## Let's say I'm teaching a tutorial, and I want to save the names of all of my students

## One way is to create a variable for each one, and save one name per variable

student_1 = "Alice"
student_2 = "Xiaojing"
student_3 = "Frank"

## This works ok... but it makes more sense to keep a single list!

my_students = ["Alice", "Xiaojing", "Frank", "George", "Amy", "Anita"]

In [None]:
my_students

*Elements* within a list can be accessed by their *index number*

*Index numbers* start at index 0, meaning that the first element in the list is at element 0

In [None]:
## You access elements of a list using this notation

my_students[0] 
my_students[1]

## You can even use an expression in the place of the index, as long
## as that expression turns into a valid index

my_index = 1 + 2
print(my_students[my_index])


In [None]:
my_students[0:3] ## Take a subset of the list, starting at 0, stopping before 3

In [None]:
my_students[-1] ## Access the list from the right (the end of it) using a negative index

In [None]:
my_students[::2] ## Step through 2 at a time

In [None]:
my_students[::-1] ## Reverse the list

Note: we reversed the list above, but what happens when you look at the value of `my_students` again?

In [None]:
my_students

In [None]:
my_students.append("Jacob") ## Add a value to the end of a list

my_students

In [None]:
my_students.pop(2) # remove element by index
my_students

In [None]:
late_registrations = [ 'Sally', 'Jose' ]

## Here we are concatenating two lists
## The `+` sign means addition for integers, but on lists it means concatenate

print(my_students + late_registrations)
print(my_students)

In [None]:
my_list = my_students.copy()
print(my_list)
my_list.append(late_registrations)
print(my_list)
print(my_students+late_registrations)

In [None]:
## We are saving the result back to the variable called `my_students`
my_students = my_students + late_registrations

In [None]:
## Checking the length of a list
len(my_students)

### Testing membership of a list

In [None]:
if 'Anita' in my_students:
    print("What's up Anita!")
else:
    print("Oh, is Anita not in this class?")

## Challenge 2
### Write some code that picks a student from your list `my_students` at random

Hint: you can always round down a decimal number to an integer with the `int(x)` function:
```python
int( 4.6 ) ## this equals 4
```


In [None]:
my_students

## Dictionaries

While lists are a great way to store any sort of ordered data that it makes sense to access via a numerical index, sometimes we want to find pieces of data based on other types of input, such as a string.

Another way to put this is that, given some **key**, we want to find the coresponding **value**.

For example, it might be easier to keep my dataset of students in a form where I can find someone's name given their NYU net id.

In [None]:
netid_students = {
    "az123": "Alice",
    "xa746": "Xiaojing",
    "fj32" : "Frank",
    "gm330": "George",
    "aa21" : "Amy",
    "ar987": "Anita",
    "g3745": "George"
}

## Now we can retrieve a name by using a netID as a key
print(netid_students['aa21'])

We can actually store anything we want as values in a dictionary, including lists, strings, numbers, or even other dictionaries.

### Using dictionaries as records

Often, people use dictionaries to store different types of information about whatever they are trying to model:

In [None]:
student1 = {
    "name" : "Alice",
    "netid": "az123",
    "currently_enrolled": True,
    "credits_completed": 32,
    "school": "GSAS",
    "enrolled_in": ["Intro to Python", "Fundamental Algorithms"]
}

student2 = {
    "name" : "Xiaojing",
    "netid": "xa746",
    "currently_enrolled": True,
    "credits_completed": 16,
    "school": "Wagner",
    "enrolled_in": [
        "Intro to Python",
        "Natural Language Processing",
        "Urban Planning",
        "Shakespeare"]
}

student_profiles = [student1, student2]

# Loops


Looping is one of the most fundamental operations in programming, but can be tricky to learn for the first time. Python has several ways of performing loops.

In [None]:
print(my_students)

In [None]:
## Print each element of a list
for student_name in my_students:
    print(student_name)

What's happening in the above example? The line `for student_name in my_students:` defines a **loop** through every element in `my_students`.

Each element gets assigned to the temporary variable `student_name`, and then the body of the loop is executed; in this case, the body is only printing the value of that temporary name.

Also notice that the line `print(student_name)` is indented. This is an aspect of Python –– the body of any loop, conditional, or function definition always needs to be indented.

In [None]:
## We use a function to automatically create a list of numbers, starting at 0,
## and ending at 19
my_numbers = list(range(0,20))

my_numbers

In [None]:
for number in my_numbers:
    if number == 5:
        print("Hey, I love 5!")
    else:
        print(number)

Not only can we loop through lists, we can loop through dictionaries. With lists, we loop through them one element at a time (as seen above). Looping through dictionaries is slightly different –– we iterate through it one **key-value** pair at a time:

In [None]:
## Notice that in order to loop through a dictionary's key-value pairs, we have
## to append .items() to the name of the dictionary

for netid, name in netid_students.items():
    print("The student with netid: " + netid + " has the first name " + name)

In [None]:
print(student_profiles)

In [None]:
## Looping through a list that contains dictonaries

for student in student_profiles:
    if "Fundamental Algorithms" in student['enrolled_in']:
        print(student['name'])

We can also loop indefinitely, as long as some constraint remains true, using a **while-loop**:

In [None]:
x = 0
while x < 100:
    print(x)
    x = x + 10
    ## Don't forget to update the value of x or you will be stuck in
    ## an infinite loop!

In general, while-loops are helpful for performing some activity when you don't know before you start how many times the loop will need to run. For-loops are helpful in cases when you do know exactly how many times you want it to run.

Text strings are simply lists of characters, so you can loop through them as well:

What does this do?
```
word = 'lead'
for char in word:
    print(char)

(a) char
(b) lead
(c)     lead
(d) aaaa
(e) rrrr
(f) l
    e
    a
    d
 ```

# Working With Strings

Python comes with many pre-defined functions that work on text strings. Combined with the ability to traverse through a string using loops (as shown above), there are tons ways to work with text.

In [None]:
## Convert text to all lowercase or uppercase

my_string = "Hello World!"

print(my_string)
print(my_string.lower())
print(my_string.upper())

In [None]:
## Split a string

intro_text = ("Python is a widely used high-level programming language for general-purpose programming, "
              "created by Guido van Rossum and first released in 1991.")

tokens = intro_text.split(" ") ## Here, we split on the whitespace character
tokens ## We end up with a list of tokens

In [None]:
## Joining a list into a string
## (the reverse of a split!)

print(" ".join(tokens))

print(", ".join(my_students))

In [None]:
## Making substitutions with replace()
addresses = "124 Fake St., 325 Broadway Ave., 718 Washington Ave., 70 W. 4th St."
print(addresses)

normalized_addresses = addresses.replace("St.","Street").replace("Ave.","Avenue")
print(normalized_addresses)


In [None]:
## Reversing a string
print(normalized_addresses[::-1])
## This uses the slice notation, with start and end set to the beginning and end
## of the string, but the "step" parameter set to -1

But what if our strings have quotes in them? How do we print a statement like this:
> Wait, she's going to say "Hello!"



In [None]:
print("Wait, she's going to say \"Hello!\"")
print('Wait, she\'s going to say "Hello!"')
print("""Wait, she's going to say "Hello!" """)

Strings can be created in a number of different ways:

```
'single quotes'
"double quotes"
""" triple-double quotes """  #This can contain line breaks!
```
or even like this:
```
''' triple-single 
    quotes '''
```



# Functions


A user may want to define a function for a sequence of calculations needed to be executed at multiple different places in code. Functions make programming more concise and can serve as a great tool for the user writing code. 


Functions are defined with the following syntax:  


In [None]:
def fun(x):
    print(x)

	def   tells Python that you are writing a function definition. This line of code is also followed by a colon
	fun     is the name of the function so that the user can call the function later on
	x     within the parenthesis are parameters passed into the function to be used 

 The proceeding lines of code are the body of a function and should be indented


In [None]:
##Suppose we have the following String variables
x = "You just called a function on x"
y = "You just called a function on y"

In [None]:
##We can call our function f on each x and y
fun(y)
fun(x)


Notice how function f simply prints the parameter we pass in. Alternatively, functions can have a return value, which is something that may be retrieved or calculated within the function


In [None]:
##Suppose we have a function that adds 5 to a given integer
def addFive(x):
    x = x + 5  #In this line, we are assigning a new value to x
    return x   #Return is what the function will give back

In [None]:
##Now let's call addFive on an integer 5
addFive(50)

In [None]:
##What does this return?
5 + addFive(5)

In [None]:
##What does this return?
addFive(addFive(5))

Functions can be nested, and are evaluated from inside to out when implemented so

In [None]:
##Let's say I want a function that adds 5 to only negative integers
def add_to_negatives(x):
    if (x < 0):
        return x + 5
    else:
        return x

In [None]:
##What is the output of calling
add_to_negatives(-2) - add_to_negatives(3)

There is no rule to how complex a function definition should be, but the key is to have it concise so that it could be called on often

In [None]:
##Functions can be as simple as a check to see if a number has more than two digits
def digits_check(x):
    if (x > 99 or x < -99):
        return True
    else:
        return False

In [None]:
digits_check(100)

In [None]:
digits_check(-88)

In [None]:
# functions are re-usable pieces of code that can make your script more compact
# functions also make it easier to compartmentalize the code
def welcome_time(hour):
    if hour > 7 and hour <= 12:
        return('Good morning!')
    elif hour > 12 and hour <= 18:
        return('Good afternoon!')
    elif hour > 18 and hour <= 22:
        return('Good evening!')
    else:
        return('You\'re too late or too early!')

print(welcome_time(10))

In [None]:
def welcome(name,h):
  print(welcome_time(h).replace('!',', ' + name + '!'))
  
welcome('Paul',10)
welcome('Sally',20)
welcome('Gene',5)

## (Group) Challenge 3

**Background:** You're doing a project on linguistics and want a way to find all of the palindromes that commonly appear in pieces of literature. But where to begin???

#### Part 1
### Write a function, `is_palindrome(x)` that takes a single word (as a string) as input, and returns either `True` or `False` depending on if the input is a palindrome
e.g.
`is_palindrome("Hello world!")` should return **`False`**.

however, `is_palindrome("tacocat")` should return **`True`**.



In [None]:
def is_palindrome(input):
    ## Fill this in!
    return False


# Help and additional functions

*   Python documentation, e.g. https://docs.python.org/3/library/math.html
*   StackOverflow e.g. [calculate logarithm in python](https://www.google.com/search?q=calculate+logarithm+in+python): https://stackoverflow.com/questions/33754670/calculate-logarithm-in-python



In [None]:
import math
print(math.sqrt(256))
print(math.log(100))
print(math.log(100)/math.log(10))

# Quick Last Things: Pandas

#### Using Pandas...and NOT using Pandas

Pandas can be a powerful tool, especially for those using it who have a background in other statistical software and are looking for a means to work with tabular data.

It introduce a new object for "holding" data, the Pandas dataframe, akin to a two-dimensional matrix or spreadsheet table

But it isn't the only (or in some cases even the best) means of dealing with data munging or data analysis in Python, particularly for large data. Its dataframes eat up memory, and can be inefficient with certain calculations.


In [None]:
import pandas as pd

my_first_dataframe = pd.DataFrame([
                                   ["3/12/1965","Student 1", "New York"],
                                   ["4/8/1976","Student 2", "Vermont"],
                                   ["7/8/2000", "Student 3", "Hawaii"]
                                  ],
    columns = ["DOB", "Person", "Residence"])

my_first_dataframe

# Loading a DataFrame

We have several options for how to make a dataframe and start working in Pandas:

1. We can load a tabular data file (e.g. a CSV) and allow Pandas to parse it as a dataframe

2. We can instantiate an empty dataframe and append rows or columns in the form of Series objects

3. We can transform a Python complex array (such as a list of lists or a list of dictionaries) into a dataframe

No matter which approach is taken, I recommend taking some time to set the various parameters of the pd.DataFrame object so that your work on the dataframe later has expected results. This includes setting column names, column order, data types of variables, and (when reading from file) encoding.

In [None]:
df_from_csv = pd.read_csv("water-consumption-nyc-csv-example.csv", delimiter = ",",
                          header = 0, names = ["Year","NYC_Pop", "Consumption","PerCapita_Consumption"],
                          dtype = {"Year":int, "NYC_Pop":int, "Consumption":int, "PerCapita_Consumption":int})

# We can use .head() and .tail() to preview just a portion of a dataframe.
# Pass as an optional parameter the number of rows you wish to see

df_from_csv.head(5)

### Pandas advantage example: super-complex filtering

Using the .loc dataframe method, we can pass a filter using a condition set on the values in one of the columns

In [None]:
df_from_csv.loc[df_from_csv.Year < 1983]

In [None]:
## Much more complex example
# Note that we use | (pipe) and & for "OR" and "AND" Booleans, respectively. We MUST use parentheses 
# to separate out each Boolean to be evaluated. 

df_from_csv.loc[(df_from_csv.Year < 1983) | (df_from_csv.Year == 1989)]


# Challenge Answers

More python resources [here](https://guides.nyu.edu/python)

## Challenge 1

The correct answer is 2: nothing is True and False at the same time

## Challenge 2

* random() gives a floating point value between 0 and 1, not including 1
* If there are 9 students, random() x 9 is a number between 0 and 9. 
* len(my_students) gives the number of elements in the list
* int() rounds the number down


In [None]:
my_students[int(random()*len(my_students))]

## Challenge 3

In [None]:
def is_palindrome(input):
    if input == input[::-1]:
      return True
    else:
      return False

is_palindrome("tacocat")