# Topics
- Setting up the environment with Anaconda
- Using Jupyter notebooks
- Pip, packages and imports
- Variables and types (String, Int, = List)
- Functions
- Control Flow
- List comprehensions
- Web scraping with Requests and BeautifulSoup
- Writing text to a file

## Virtual environments and Anaconda
- Isolated environment so each project can have its own dependencies without conflicts with other projects
- Anaconda has its own environment manager, package manager, let's you easily set Python versions, and comes with many of the standard packages used in scientific computing

## Jupyter Notebooks
- Used to be iPython Notebooks
- Write and evaluate code at a granular level without rerunning scripts constantly and using a lot of print debugging
- Mix in Markdown and HTML within your notebook, and so is a great way of presenting code and data analysis

## Pip, Conda, Packages and Imports
- Using Anaconda, you get the conda package manager, but can still use pip to install other packages to your environment
- At the top of your script/file, do imports. 
- Import whole module
- Import as
- Import part of a module

In [79]:
from bs4 import BeautifulSoup
import os
import requests
import pandas as pd

## Types and variables

In [7]:
# Strings
greeting = "Hello, I'm Scott. It's a pleasure to meet you."
print(greeting)
greeting

Hello, I'm Scott. It's a pleasure to meet you.


"Hello, I'm Scott. It's a pleasure to meet you."

In [12]:
# Find a letter by index
greeting[0]

'H'

In [13]:
# Get the length of a string
len(greeting)

46

In [14]:
# Count spaces in the string
greeting.count(' ')

8

In [15]:
# Slice to get the first 3 characters
greeting[0:3]

'Hel'

In [16]:
# Get the last three characters
greeting[-3:]

'ou.'

In [17]:
# Replace hello with goodbye
greeting.replace("Hello", "Goodbye")

"Goodbye, I'm Scott. It's a pleasure to meet you."

In [18]:
# String concatenation
"Hello" + "World"

'HelloWorld'

In [11]:
# Numbers
# Integer
first_num = 10
second_num = 5.467
print(type(first_num), type(second_num))

<class 'int'> <class 'float'>


In [19]:
# Addition
1 + 5

6

In [20]:
# Division
10 / 2

5.0

In [21]:
# Multiplication
5 * 2

10

In [22]:
# Lists
drinks = ['coffee', 'tea', 'water']
drinks

['coffee', 'tea', 'water']

In [24]:
mixed = [2, 'hello', 10.5, 'here is a sentence']
mixed

[2, 'hello', 10.5, 'here is a sentence']

In [25]:
# Get item by index
drinks[2]

'water'

In [27]:
# Add an item to the end of the list
drinks.append('juice')
drinks

['coffee', 'tea', 'water', 'juice']

There are plenty of other data types that we aren't going to use today, such as: sets, dictionaries, tuples, and so forth. 

## Functions

At the most basic level, functions are chunks of reusable code

In [28]:
# Define a function
def add(x, y):
    return x + y
add(1, 2)


3

In [34]:
def combine_arrays(array1, array2):
    new_list = array1 + array2
    return new_list
first = ['hello', 2]
second = ['1', 10]
new = combine_arrays(first, second)
new

['hello', 2, '1', 10]

## Control flow 

In [47]:
# IF
name = "Bob"
if name == "Scott":
    print("Hi Scott!")
else:
    print("Who are you?")

Who are you?


In [39]:
name = "John"
def say_hello(name):
    return "Hello " + name + "!"
if (name == "Bob"):
    message = say_hello("Bob")
    print(message)
elif (name == "Scott"):
    message = say_hello("Scott")
    print(message)
else:
    print("Who are you?")

Who are you?


In [42]:
# FOR
names = ["Stu", "Scott", "Javier", "Ashley"]
for name in names:
    print(name, len(name))

Stu 3
Scott 5
Javier 6
Ashley 6


In [44]:
for name in names[:3]:
    if len(name) > 5:
        print(name)

Javier


In [48]:
def add_one(num):
    return num + 1
nums = [1, 2, 3, 4]
plus = []
for num in nums:
    plus.append(add_one(num))
plus

[2, 3, 4, 5]

In [49]:
# ADVANCED: List Comprehensions
added = [add_one(num) for num in nums]
added

[2, 3, 4, 5]

In [52]:
long_names = [name.lower() for name in names[:3] if len(name) > 5]
long_names

['javier']

## Web scraping with Requests and Beautiful Soup

### Scraping text

In [55]:
url = "https://en.wikipedia.org/wiki/Stanford_University"
page = requests.get(url)
soup = BeautifulSoup(page.text, "lxml")


In [56]:
hatnote = soup.find('div', {'class': 'hatnote'})
hatnote

<div class="hatnote" role="note">"Stanford" redirects here. For other uses, see <a class="mw-disambig" href="/wiki/Stanford_(disambiguation)" title="Stanford (disambiguation)">Stanford (disambiguation)</a>.</div>

In [57]:
hat_text = hatnote.get_text()
hat_text

'"Stanford" redirects here. For other uses, see Stanford (disambiguation).'

In [62]:
main_text_area = soup.find('div', {'class': 'mw-content-ltr'})
main_text = main_text_area.find('p')
main_text.get_text()

'Stanford University, officially Leland Stanford Junior University,[8] is a private research university in Stanford, California, adjacent to Palo Alto and between San Jose and San Francisco. Its 8,180-acre (12.8\xa0sq\xa0mi; 33.1\xa0km2)[9] campus is one of the largest in the United States.[note 1] Stanford also has land and facilities elsewhere.[7][9]'

In [10]:
paragraphs = soup.find_all('p')
type(paragraphs)

bs4.element.ResultSet

In [12]:
for para in paragraphs:
    print(para.get_text())
    

Stanford University, officially Leland Stanford Junior University,[8] is a private research university in Stanford, California, adjacent to Palo Alto and between San Jose and San Francisco. Its 8,180-acre (12.8 sq mi; 33.1 km2)[9] campus is one of the largest in the United States.[note 1] Stanford also has land and facilities elsewhere.[7][9]
The university was founded in 1885 by Leland and Jane Stanford in memory of their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. The school admitted its first students 125 years ago on October 1, 1891,[2][3] as a coeducational and non-denominational institution.
Stanford University struggled financially after Leland Stanford's death in 1893 and again after much of the campus was damaged by the 1906 San Francisco earthquake.[12] Following World War II, Provost Frederick Terman supported faculty and gr

### Another text scraping example

Let's create a list of urls for the chapters of A Byte of Python, iterate over the first few, and get that page content.

A Byte of Python is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, allowing us to copy the book, distribute it, transmit it, remix it and so forth. 

In [63]:
url = "https://python.swaroopch.com/"
page = requests.get(url)
soup = BeautifulSoup(page.text, "lxml")

In [69]:
chapters = soup.find('nav').find_all('a')
chapters

[<a class="custom-link" href="https://www.gitbook.com/book/swaroopch/byte-of-python" target="_blank">A Byte of Python</a>,
 <a href="./">
             
                     
                     Introduction
             
                 </a>,
 <a href="dedication.html">
             
                     
                     Dedication
             
                 </a>,
 <a href="preface.html">
             
                     
                     Preface
             
                 </a>,
 <a href="about_python.html">
             
                     
                     About Python
             
                 </a>,
 <a href="installation.html">
             
                     
                     Installation
             
                 </a>,
 <a href="first_steps.html">
             
                     
                     First Steps
             
                 </a>,
 <a href="basics.html">
             
                     
                     Basic

In [75]:
for a in chapters:
    print(a['href'])

https://www.gitbook.com/book/swaroopch/byte-of-python
./
dedication.html
preface.html
about_python.html
installation.html
first_steps.html
basics.html
op_exp.html
control_flow.html
functions.html
modules.html
data_structures.html
problem_solving.html
oop.html
io.html
exceptions.html
stdlib.html
more.html
what_next.html
floss.html
about.html
revision_history.html
translations.html
translation_howto.html
feedback.html
https://www.gitbook.com


In [76]:
def create_url(url):
    return 'https://python.swaroopch.com/' + url
chapter_links = [create_url(a['href']) for a in chapters[2:-1]]
chapter_links

['https://python.swaroopch.com/dedication.html',
 'https://python.swaroopch.com/preface.html',
 'https://python.swaroopch.com/about_python.html',
 'https://python.swaroopch.com/installation.html',
 'https://python.swaroopch.com/first_steps.html',
 'https://python.swaroopch.com/basics.html',
 'https://python.swaroopch.com/op_exp.html',
 'https://python.swaroopch.com/control_flow.html',
 'https://python.swaroopch.com/functions.html',
 'https://python.swaroopch.com/modules.html',
 'https://python.swaroopch.com/data_structures.html',
 'https://python.swaroopch.com/problem_solving.html',
 'https://python.swaroopch.com/oop.html',
 'https://python.swaroopch.com/io.html',
 'https://python.swaroopch.com/exceptions.html',
 'https://python.swaroopch.com/stdlib.html',
 'https://python.swaroopch.com/more.html',
 'https://python.swaroopch.com/what_next.html',
 'https://python.swaroopch.com/floss.html',
 'https://python.swaroopch.com/about.html',
 'https://python.swaroopch.com/revision_history.html',

In [78]:
def get_page_text(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "lxml")
    return soup.find('section', {'class': 'markdown-section'}).get_text()
for url in chapter_links:
    print(get_page_text(url))



Dedication
To Kalyan Varma and many other seniors at PESIT who introduced us to GNU/Linux and the world of open source.
To the memory of Atul Chitnis, a friend and guide who shall be missed greatly.
To the pioneers who made the Internet happen. This book was first written in 2003. It still remains popular, thanks to the nature of sharing knowledge on the Internet as envisioned by the pioneers.


Preface
Python is probably one of the few programming languages which is both simple and powerful. This is good for beginners as well as for experts, and more importantly, is fun to program with. This book aims to help you learn this wonderful language and show how to get things done quickly and painlessly - in effect 'The Anti-venom to your programming problems'.
Who This Book Is For
This book serves as a guide or tutorial to the Python programming language. It is mainly targeted at newbies. It is useful for experienced programmers as well.
The aim is that if all you know about computers is h

### Writing text to a file

In [85]:
def create_filename(name, dirname):
    chunks = name.split('.')
    filename = os.path.join(dirname, chunks[0] + '.txt')
    return filename

def create_url(url):
    return 'https://python.swaroopch.com/' + url

def get_page_text(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "lxml")
    return soup.find('section', {'class': 'markdown-section'}).get_text()

for a in chapters[2:-1]:
    filename = create_filename(a['href'], 'chapters')
    text = get_page_text(create_url(a['href']))
    with open(filename, 'w') as f:
        f.write(text)