# String methods, Iterables, and Functions

In this Jupyter Notebook we will cover basic string methods, go over an example on word counts, how to use iterables, conditional executions (if/else/elif statements), lists comprehensions and how to write your own function.

# Notebook Outline
- [String Methods](#Native-String-Methods)
- [Word Count Expamples](#Let's-do-some-word-counting!)
- [Loops and Conditionals](#Loops-and-Conditionals)
- [Nesting](#Nesting)
- [Lists Comprehensions](#Lists-Comprehensions)
- [Functions](#Functions)
- [Lambda Functions](#Lambda-Functions)

## Native String Methods

In [None]:
friend= '''You've got a friend in me
You've got a friend in me
When the road looks rough ahead
And you're miles and miles
From your nice warm bed
You just remember what your old pal said
Boy, you've got a friend in me
Yeah, you've got a friend in me
You've got a friend in me
You've got a friend in me
You got troubles, I've got 'em too
There isn't anything I wouldn't do for you
We stick together and see it through
'Cause you've got a friend in me
You've got a friend in me
Some other folks might be
A little bit smarter than I am
Bigger and stronger too
Maybe
But none of them will ever love you
The way I do
It's me and you, boy
And as the years go by
Our friendship will never die
You're gonna see it's our destiny
You've got a friend in me
You've got a friend in me
You've got a friend in me'''

How would we create a list, where each element was a line from the song?

In [None]:
#let's take a look at the object 
friend

How many words are in the song?

In [None]:
total_words=friend.split()
total_words

In [None]:
len(total_words)

In [None]:
#Because each line is seperated by '\n' we can use the .split() function!
lines=friend.split("\n")
lines

In [None]:
#use the .lower() function to put everything in lowercase 
print(friend.lower())

In [None]:
#use the .upper() function to put everything in uppercase 
print(friend.upper())

In [None]:
# Finding length of a string
word='science'

In [None]:
print(len(word))

In [None]:
import string 
string.punctuation

In [None]:
friend.count("you've")

### Importanct of lowering strings! 

In [None]:
friend.lower().count("you've")

# Let's do some word counting! 

In [None]:
help(str.maketrans)

## Must also remove punctuaction! 

In [None]:
# This maketrans function can be used to remove a list of characters from a string.
remv_punc = str.maketrans('','',string.punctuation)

friend_nopunc=friend.translate(remv_punc).lower()
friend_nopunc[:50]

In [None]:
import pandas as pd

# Split on whitespace and on new lines
lines_words = [x.split(" ") for x in friend_nopunc.split("\n")]
lines_words[:50]

#### Yay! Now we have relatively clean data! 

## The slow way to count words

In [None]:
# Convert each to pandas Series object
lines_series = [pd.Series(x) for x in lines_words]
lines_series

In [None]:
# Call function "value_counts" to count the number of words.
lines_wcs = [x.value_counts() for x in lines_series]
lines_wcs

In [None]:
# Concatenate them all together
df = pd.concat(lines_wcs,axis=1)
df

In [None]:
#fill missing values with zero and transpose
df = pd.concat(lines_wcs,axis=1).fillna(0).T
df

In [None]:
df.sum()

In [None]:
df.sum().sort_values(ascending=False).head(10)

## Do not forget about stop words! 

In [None]:
from nltk.corpus import stopwords
dem_words = stopwords.words("english")
dem_words
#notice something about one of the top words in our song and its presence in the stop word list??

In [None]:
#make dem_words a string instead of a list for the next step
dem_wordz=' '.join(dem_words)
dem_wordz

In [None]:
#remove puncuation from stop words string
remv_punc = str.maketrans('','',string.punctuation)
dem_words_no_punc=dem_wordz.translate(remv_punc).lower()
dem_words_no_punc

In [None]:
#convert dem_words_no_punc back to a list 
dem_words_final=dem_words_no_punc.split(" ")
dem_words_final

In [None]:
df_no_sws = df.loc[:,~df.columns.isin(dem_words_final)]
df_no_sws.head()

#### What that one line did, step by step:
              df.loc[:,~df.columns.isin(dem_words)]
  
  1. First it called the list of columns and assessed a Boolean condition: are any of the column labels in the list "dem_words"? `df.columns.isin(dem_words_final)`
  2. Then it took the negative of that condition using the character "\~"; now it assesses whether a character __isn't__ in the list:`~df.columns.isin(dem_words_final)`
  3. Then, using the object `df`, it only selects the columns where the condition holds ("True"), meaning those column labels that are not in the list "dem_words_final": `df.loc[:,~df.columns.isin(dem_words_final)]`
          

In [None]:
df_no_sws.sum().sort_values(ascending=False).head(5)

### The fast way!
#### It may not seem fast now, but if you are working with a lot of data this can be much, much quicker

In [None]:
dem_words_final #this was our final stop words list 

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
help(CountVectorizer)

In [None]:
string_list = [x for x in friend_nopunc.split("\n")]
string_list[:]

In [None]:
vct = CountVectorizer(stop_words=dem_words_final )
X= vct.fit_transform(string_list)
word_counts = pd.DataFrame(X.sum(axis=0))
word_counts.columns = vct.get_feature_names()
word_counts = word_counts.T

In [None]:
word_counts = word_counts[0].sort_values(ascending=False)
word_counts.head(5)

## Loops and Conditionals

### What the heck is an iterable???

### "Iterable is an object which can be looped over or iterated over with the help of a for loop. Objects like lists, tuples, sets, dictionaries, strings, etc. are called iterables. In short and simpler terms, iterable is anything that you can loop over." 
#### Source: [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2021/07/everything-you-should-know-about-iterables-and-iterators-in-python-as-a-data-scientist/#:~:text=Iterable%20is%20an%20object%20which,that%20you%20can%20loop%20over.)

In [None]:
# print numbers in a certain range
for i in range(5):
    print (i)

In [None]:
for i in range(0,10,3): #Range frim 0 to 10 with increment/step size 3
    print(i)

In [None]:
# loop of the cubed numbers:
for x in range(1,5):
    print(x**3)

In [None]:
# If we want to have as a result a list of numbers, we can do:
cubes=[]
for x in range(1,5):
    cubes.append(x**3)
    
cubes

### If Statement:

In [None]:
# if a exists, print
a = 3
if a:
    print ('a =', a)

In [None]:
if(1<2):
    print ("1 IS less than 2!")

### If else Statement:

In [None]:
# Example 1:
x=5
y=10
if(x>=5):
    if(y!=10):
        print ("option A")
    else:
        print ("option B")
else:
    if(y<11):
        print ("option C")
    else:
        print ("option D")

In [None]:
# Example 2:
x= range(10)
type(x)

In [None]:
for element in x:
    print(element)

In [None]:
for i in x:
    if i < 4:
        print(i)

In [None]:
for i in x:
    if i < 4:
        print(i)
    else:
        print('{} is not less than 4'.format(i))

In [None]:
for i in x:
    if i < 4:
        print(i)
    else:
        print(f'{i}is not less than 4')

### Elif Statement:

In [None]:
# elif specify an alternative condition: 
for i in x:
    if i < 4:
        print('{} is less than 4'.format(i))
    elif i > 4:
        print('{} is greater than 4'.format(i))

In [None]:
# we can add an else that catches everything else besides the two options at if and elif
for i in x:
    if i < 4:
        print('{} is less than 4'.format(i))
    elif i > 4:
        print('{} is greater than 4'.format(i))
    else:
        print('{} is equal to 4'.format(i))

Other examples of iterables:
- Finding the sum of the numbers of a list
- Finding the length of a string

In [None]:
# Let's define a list:
x = list(range(5)) 

In [None]:
print(x)

In [None]:
sum(x)

In [None]:
k=5
sum=0
for i in range(k):
    sum =+ i 
    print('we are in loop',i,'and the sum equals',sum)

print('we finished!')
print('The sum of the first {} integers is {}'.format(k, sum))

In [None]:
word

In [None]:
length=0
for char in word:
    length+=1
print('The word {} has {} characters'.format(word,length))


## Nesting

In [None]:
#this is how the loop will look like when we nest to index:
print('i j')
for i in range(4):
    for j in range(3):
        print(i, j)

We saw a couple of examples of for loops in python. However, sometimes is useful an alternative way to express them. Specially, if those alternative will be more efficient (in term of code and time). In some cases, we will make use of list comprehensions. Here we will se how that works:

## Lists Comprehensions

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.



In [None]:
# We saw above, the following for loop:
cubes=[]
for x in range(3,7):
    cubes.append(x**3)
    
cubes

In [None]:
# We can express the same result using a list comprehension as following:
cubes = [x**3 for x in range(3,7)]
cubes

Compare the two codes above. A list comprehension was only one line of code!

A list comprehension consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses.

If the expression is a tuple (e.g. the (x, y) in the following example), it must be parenthesized.

In [None]:
# Example 2:
# this listcomp combines the elements of two lists if the elements are different:
#x in range(3)= 0,1,2
#y in range(2,4)=2,3
[(x, y) for x in range(3) for y in range(2,4) if x != y]

In [None]:
# Which is equivalent to:
# Compare the order of the for loops here, with respect to the listcomp above
combs = [] #empty list
for x in range(3):
    for y in range(2,4):
        if x != y:
            combs.append((x, y))
combs

In [None]:
# Example 3:

# Here is a vector:
vec = [-4, -2, 0, 2, 4]
vec

In [None]:
# create a new list with the values doubled (x*2)
[x*2 for x in vec]


In [None]:
#  filter the list to exclude negative numbers
[x for x in vec if x>=0]

In [None]:
# apply the function abs() (absolute value) to all the elements of vec:
[abs(x) for x in vec]

In [None]:
# Exampl 4:
# We have a list in which elements contains white spaces, like:

colors = ['  green', '   blue  ', '     orange  ']
# We need to remove that space. We can do that with the method .strip :

[element.strip() for element in colors]

In [None]:
# Example 5:
# create a list of 2-tuples like (number, square)
# Note that tuple must be parenthesized:
[(x, x**2) for x in range(6)]

In [None]:
# Example 6:
# flatten a list using a listcomp with two 'for'
vec = [[1,2,3], [4,5,6], [7,8,9]]
vec
#And say we want this: [1,2,3,4,5,6,7,8,9]

In [None]:
# First, what is we use only one for?:
[x for x in vec]

# we get the same!

In [None]:
# Using two for:
[num for x in vec for num in x]

# the first expression `num` is the outcome that we will see.
# the first for is to extract the elements (lists inside the list)
# the second for is to extract the numbers inside the list

## Functions

In [None]:
# function without an argument:

def greeting():
    print ('hello')

In [None]:
greeting()

In [None]:
# let's define a function with one argument, called name:
def greeting2(name):
    print ('Hello,',name,'!')

In [None]:
greeting2('Adam')

In [None]:
# define function with return value
def greetingstring(first,last):
    string="Hello {} {}, How are you? How about that invisible hand?".format(first,last)
    return string

In [None]:
greetingstring('Adam','Smith')

In [None]:
# write a function to square an integer

def square(x):
    x2 = x**2

In [None]:
# Note that the function will not return what we are expecting:
square(5)

In [None]:
# We need a return inside the function:
def square(x):
    x2 = x**2
    return x2


In [None]:
square(5)

In [None]:
# or, we can simply do:
def square(x):
    return x**2

In [None]:
square(3)

# Multiple Arguments in a function

## The good 'ol Quadratic formula
$x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$


In [None]:
def quad(a,b,c):
    return ((-b+((b**2) -(4*a*c))**.5)/(2*a),(-b-((b**2) -(4*a*c))**.5)/(2*a))

Find $X$: $X^2 -X -6=0 $

In [None]:
quad(1,-1,-6)

## Lambda Functions

A lambda function/expression is a small **anonymous function** (we don't assign a name to it). 

syntax:

`lambda` *arguments*: *expression*

In the expression part, we only can write ONE expression (for example `argument+2` will be one expression). While in the argument part, we can have multiple arguments.


In [None]:
x = lambda a : a**2
print(x(5))

In the example above, `x` represents a *function*. And we can use is in other functions (we will see it below). 

Also, we can use three arguments too: 

In [None]:
x = lambda a, b, c : ((-b+((b**2) -(4*a*c))**.5)/(2*a),(-b-((b**2) -(4*a*c))**.5)/(2*a))
print(x(1,-1,-6))

**Why we should use this?**

Remember than everything is an object in Python. Functions are objects too! so we can use lambda functions inside a function.  The power of lambda is better shown when you use them as an anonymous function inside another function: 

In [None]:
# Assume that we want to create a function that add a 5 to its input. 
# However, the input will be doubled before adding 5. Use lambda function to double the number, and use it
# inside a function: 
def func(x):
    func2= lambda x: x*2
    return func2(x)+5


In [None]:
func(3)

### Lambda function with Filter:
Here is another example. We have a list of numbers:

In [None]:
# we can define a function that returns a Boolean expression (True/False) if the remainder of the division is zero.
#  Operator "%" divides left hand operand by right hand operand and returns remainder. 
    
def is_even(n):
    return n%2==0


In [None]:
is_even(3)

In [None]:
nums = [2,4,7,5,8,10,15]

`filter()` method constructs an iterator from elements of an iterable for which a function returns true. `filter()` takes two arguments: a **function**--that tests if elements of an iterable return true or false-- and an **iterable** which is to be filtered, could be sets, lists, tuples, or containers of any iterators 


In [None]:
evens = list(filter(is_even,nums))

In [None]:
print(evens)

In the example above, we had to define the function in advance. And we can save time and space if we make use of a lambda function as below:

In [None]:
evens = list(filter(lambda n: n%2==0, nums))

In [None]:
print(evens)

Here is another example of the lambda with filter:

In [None]:
leq10 = list(filter(lambda n: n>=10, nums))

In [None]:
print(leq10)

### Lambda Functions with Map:

The `map()` function in Python takes in a function and a list as an argument. A new list is returned which contains all the lambda modified items returned by that function for each item.

In **map**: Function will be applied to all objects of iterable. In **filter**: Function will be applied to only those objects of iterable who goes *True* on the condition specified in expression.


In [None]:
# to get double of a each item. 
num = [5, 7, 22, 97, 54, 62] 
  
double = list(map(lambda x: x*2, num)) 
print(double) 

In [None]:
# upper case letter:

colors = ['green', 'blue','orange','red']

uppered = list(map(lambda color: str.upper(color), colors))

In [None]:
print(uppered)

### Lambda Functions with Reduce:

The `reduce()` function takes as argument a function and a list. This performs a repetitive operation over the pairs of the iterable. The reduce() function belongs to the `functools` module. 


In [None]:
from functools import reduce

list = [1,2,3,4,5,6,7]

sum = reduce((lambda x,y: x*y ), list)

In [None]:
print(sum)

Here the results of previous two elements are added to the next element and this goes on till the end of the list like ((((((1*2)*3)*4)*5)*6)*7).