# Basic Python Reference Guide

## Lists and Dictionaries

In [None]:
# This is a list
my_list = ["cat", "bat", 1, 2, 3, 4, True]
print(my_list)

You can add or remove items from a list using the following methods.

In [None]:
# Add to the end of the list
my_list.append("rat")
print(my_list)

# Insert something in middle of list - first argument is the index of the list, the second is the item to insert
my_list.insert(6, 5)
print(my_list)

# Remove item from list
my_list.remove("rat")
print(my_list)

# Or can use the del statement and index of item to remove
del my_list[7]
print(my_list)

You can access items within a list using the following methods.

*Note: In Python, the index starts at 0*

In [None]:
# Access items with the index
print(my_list[1])

# Can use that in string concatenation also - works without the print() statement too
print("I have a " + my_list[0])
print("I have a " + my_list[0] + " who is " + str(my_list[6]) + " years old")

# Or find the index of a given item
print(my_list.index('cat'))

In [None]:
# Use slicing to create a subset of original list
shorter_list = my_list[2:6]
shorter_list

In [None]:
# Using lists in loops
for item in my_list:
    print(item)

for i in range(len(my_list)):
    print('Index ' + str(i) + ' in list is: ' + str(my_list[i]))

# This loop does the exact same as the one above using the enumerate function
for index, item in enumerate(my_list):
    print('Index ' + str(index) + ' in list is: ' + str(item))

In [None]:
# The in and not in operators - returns boolean value
'cat' in my_list
'bat' not in my_list

if 'cat' not in my_list:
    print('There is no cat')
else:
    print('There is a cat!')

You can generate random lists or select a random item from a list with the `random` module.

In [None]:
# Using the random choice and random shuffle functions from the random module
import random
print(random.choice(my_list))

random.shuffle(my_list)
my_list

**Dictionaries** are similar to lists but they are unordered and they use curly braces.

In [None]:
# This is a dictionary
my_dict = {"cat":"Oliver", "color":"black", "age":5}
print(my_dict)

In [None]:
# Get the keys from a dictionary
print(my_dict.keys())

# Get the values from a dictionary
print(my_dict.values())

# Get item pairs
print(my_dict.items())

In [None]:
# Using dictionaries in loops
for v in my_dict.values():
    print(v)

In [None]:
# Find out if something is in a dictionary- can add a second argument to set a default value if the item is not found
print(my_dict.get('cat', ' ')) # If item is not found, it will return the second argument

# Or can use the setdefault method
my_dict.setdefault('color', 'white')
print(my_dict.get('color'))

# If item/key does not exist, this will create it
my_dict.setdefault('species', 'feline')
print(my_dict.get('species'))

#### Summary
In sum, there are several differences between lists and dictionaries (e.g., a list is contained within brackets, whereas dictionaries are contained within curly braces) but a lot of things can be done with both.  

**Methods for Lists**
List Methods  | Description               | Arguments     
--------------|---------------------------|-----------------------
.append()     | Add item to end of list   | item to add
.insert()     | Add item anywhere in list | index of list, item to add
.remove()     | Remove item from list     | item to remove
.sort()       | Sort items in list        | Optional: reverse=T/F, key=str.lower
.index()      | Get index of item in list | item

**Methods for Dictionaries**
Dictionary Methods | Description               | Arguments
-------------------|---------------------------|---------------
.keys()            | Get keys of dictionary    | n/a
.values()          | Get values of dictionary  | n/a
.items()           | Get dict item pairs       | n/a
.get()             | Verify if item is in dict | item
.setdefault()      | Set default value for key | key, value

## Working with Strings

In [None]:
# Easy way of inserting variables into a string - can use regular variables or items from a dictionary
'My name is %s. I am %s years old and I am %s.' % (my_dict['cat'], my_dict['age'], my_dict['color'])


In [None]:
# Using f strings is even easier but cannot use dictionary items
cat = "Oliver"
f'My name is {cat}.'

In [None]:
# Using the join and split methods
myString = "    My name is Oliver    "
mySplitString = myString.split()
print(mySplitString)
print(' '.join(mySplitString))

# Can also use join on items in a list
', '.join(str(my_list))

#### Strings Summary
The same indexing, slicing, and in/not in operators for strings as can be used with lists.

String concatenation can be used to join multiple strings together but an easier (and prettier) way is to use the *%s* method. BUT an even easier way of string interpolation is to use f strings where you can insert the variable names directly in the strings inside curly braces.  

Useful methods for strings:
* .isupper()
* .islower()
* .upper()
* .lower()
* .startswith()
* .endswith()
* .join()
* .split()
    + Optional argument can be passed to specify where you want to split the string (e.g., '\n' for splitting on each new line)
* .strip()
    + Also rstrip() and lstrip() for removing whitespace (or other values) on only the right side or left side, respectively

## Pattern Matching with Regex

To use regular expressions you need to import the `re` module.

In [None]:
import re

vowels = re.compile(r'[aeiou]')

#### Regex Summary

Methods for `re` module
Method      | Description       
------------|-------------------
.compile()  | Creates the regex object
.search()   | Searches the given text for the regex object; returns 1st object found
.group()    | Returns the `Match` object returned by the .search() method
.findall()  | Returns a list of all matches
.sub()      | Replace any matches with the given string


Regular Expressions
Regex   | Description
--------|---------------------
\d      | any digit from 0 to 9
\D      | any character NOT a digit
\w      | any letter, numeric digit, or underscore
\W      | any character NOT a letter, numeric digit, or underscore
\s      | any space, tab, or newline
\S      | any character NOT space, tab, or newline
'*'     | match zero or more times
'+'     | match one or more times
'?'     | match zero or one time
'.'     | any character except newline


## Working with Files

Modules that you will need to work with files: `os` (stands for operating system), `shutil` (shell utilities), `pathlib`, and `sys`

In [None]:
import os, shutil, sys
from pathlib import Path

In [None]:
# These two functions do the same thing (prints out the current working directory)
Path.cwd()
os.getcwd()

In [None]:
for folder, subf, filenames in os.walk('./Desktop'):
    for filename in filenames:
        if filename.endswith('.csv'):
            print(filename)

In [None]:
p = Path('./Desktop')
list(p.glob('*.md'))

In [None]:
with open ('NAME OF FILE', 'r') as myfile:
    contents = myfile.read()
    myfile.close()
print(contents)

Methods to use with the Path function
Method      | Description
------------|---------------------
.cwd()      | Return the current working directory
.home()     | Return the home directory
.mkdir      | Make a new directory
.glob()     | Return a list of files in the directory - needs to be used on Path object (see above example)
.exists()   | Checks to see is a directory or file exists
.is_file()  | Checks to see if a given object is a file
.is_dir()   | Checks to see if a given object is a folder/directory

Methods to use with the `os` module
Method      | Description
------------|------------------
.mkdirs()   | Makes as many directories as specified
.getcwd()   | Returns the current working directory
.chdir()    | Changes the current working directory
.listdir()  | Lists all files and folders in a given directory
.walk()     | Returns all folders, subfolders, and files in the given directory

## Data Manipulation and Analysis with Pandas

In [1]:
# Import the pandas module
import pandas as pd

# Load the data 
df = pd.read_csv('data.csv')

# Other methods for loading data include pd.read_excel(),pd.read_spss, pd.read_json(), pd.read_html(), and pd.read_xml()

In [None]:
# Using .describe() returns summary statistics for numeric data
# Adding the 'include' argument will return information for categorical data
df.describe(include=['O'])

Methods to use with the `pandas` module

Method                      | Description
----------------------------|------------------
.read_csv()                 | Reads a csv file into a dataframe
.to_csv()                   | Writes a dataframe to a csv file
.head()                     | Returns the first n rows of a dataframe
.tail()                     | Returns the last n rows of a dataframe
.info()                     | Returns the number of rows and columns in a dataframe
.describe()                 | Returns the mean, median, standard deviation, and other statistics for a dataframe
df.[var_name].mode()        | Returns the most common value in a variable within a dataframe
df.[var_name].skew()        | Returns the skewness of a variable within a dataframe
df.[var_name].kurt()        | Returns the kurtosis of a variable within a dataframe
df.[var_name].value_counts()| Returns the number of times each value occurs within a variable within a dataframe
df.groupby(['var_name'])    | Groups the dataframe by the specified variable - can use methods above to get grouped values


## Web Scraping  

Main useful modules are:
* `requests` - can download webpages or JSON data from webpages
* `bs4` - BeautifulSoup4 is used for parsing HTML data (e.g., after getting data with a `request`)
* `webbroser` - used to open webpages
* `selenium` - used to open and control webpages

In [2]:
import webbrowser, sys, requests, selenium, bs4, pyperclip

### Using the webbrowser module

In [None]:
# Can open any URL using webbrowser
search = 'Fredericksburg, VA'
webbrowser.open('https://www.google.com/maps/place/' + search)

### Using requests and Beautiful Soup

In [None]:
# Can use requests to get a webpage and parses it using bs4
searchTerms = 'psychopathy'
res = requests.get('https://scholar.google.com/scholar?as_q=' + searchTerms + '&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=&as_publication=&as_ylo=2010&as_yhi=2021&hl=en&as_sdt=0%2C47')
res.raise_for_status()

In [26]:
searchTerms = 'sadism'
res2 = requests.get('https://scholar.google.com/scholar?scisbd=1&q=' + searchTerms + '&hl=en&as_sdt=0,47')
res2.raise_for_status()

In [27]:
# Create a soup object to parse the contents of the webpage and select items of interest 
soup = bs4.BeautifulSoup(res2.text, 'html.parser')
published = soup.select('span.gs_age')
title = soup.select('h3.gs_rt')
link = soup.select('div.gs_ri')


In [28]:
title_list = []
pub_date_list = []

for i in title:
    title_list.append(i.getText())

for i in published:
    pub_date_list.append(i.getText())

df = pd.DataFrame({'Title': title_list, 'Published Date': pub_date_list})
df.head()
#print('\n'.join(title_list) + ' ' + '\n'.join(pub_date_list))

#link[0].select('h3.gs_rt a')[0].get('href')

Unnamed: 0,Title,Published Date
0,Dark or disturbed?: Predicting aggression from...,3 days ago -
1,Psychological and physical cues to vulnerabili...,7 days ago -
2,[HTML][HTML] A qualitative analysis of sadisti...,7 days ago -
3,Hormonal response to perceived emotional distr...,7 days ago -
4,The Decadent Novel,8 days ago -


In [30]:
for item in link: 
    print(item.select('h3.gs_rt a')[0].get('href'))

https://onlinelibrary.wiley.com/doi/abs/10.1002/ab.21990
https://www.sciencedirect.com/science/article/pii/S0191886921005687
https://indexarticles.com/reference/british-journal-of-forensic-practice-the/a-qualitative-analysis-of-sadistic-endorsement-in-a-group-of-irish-undergraduates/
https://www.sciencedirect.com/science/article/pii/S0191886921005596
https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780190066956.001.0001/oxfordhb-9780190066956-e-16
https://link.springer.com/article/10.1007/s00414-021-02674-0
https://www.ejournals.eu/TP/Nr-5/art/19637/
https://link.springer.com/article/10.1007/s12144-021-02174-9
https://books.google.com/books?hl=en&lr=&id=99c7EAAAQBAJ&oi=fnd&pg=PT7&dq=sadism&ots=cajAUMrvBS&sig=NLhwm6V5gBQNH2Qc9-OZBfEVW8I
https://riviste.unimi.it/index.php/schermi/article/view/14214


Method      | Description
------------|--------------------------
.text()     | Returns the text of the response
.url()      | Returns the url of the response
.select()   | Returns a list of elements
.getText()  | Returns the text of the element

## Working with Excel and PDFs

For excel files, use the `openpyxl` module and for PDFs use either `pdfminer` or `PyPDF2`.  

*Note: PDFs must be opened in binary mode ('rb') with Python's `open` function*

Methods for Excel:
Method                | Description
----------------------|-------------------
.load_workbook()      | Loads an existing wkbk
.save()               | Saves the wkbk
.create_sheet()       | Creates a new sheet in current wkbk
.get_sheet_by_name()  | Returns a sheet by name
.get_sheet_names()    | Returns a list of all sheet names
.Workbook()           | Creates a new wkbk 

Methods for PDFs:
Method                | Description
----------------------|------------------------
.PdfFileReader()      | Opens a PDF file
.PdfFileWriter()      | Creates a new PDF file
.numPages()           | Returns the number of pages in the PDF
.getPage()            | Returns a page from the PDF
.extractText()        | Returns the text from the PDF

## Email and GUI Automation

For sending emails with Python, you can use the `smtplib` module but it is (somewhat) easier and safer to use the `ezgmail` module because it doesn't require you to have your username and password in the source code.

For gui automation, use the `pyautogui` module.

In [None]:
import pyautogui, smtplib, ezgmail