# **Introduction to text analysis in Python. Day 2 Part 1**

## *Dr Kirils Makarovs*

## *k.makarovs@exeter.ac.uk*

## *University of Exeter Q-Step Centre*

---


# **Welcome to Day 2!**

## **Today, we are going to look at:**

+ `f-strings` in Python
+ `for` loops, `if-else` statements and functions in Python (quick guide)
+ `.apply()` method to deal with multiple text entries in a dataframe
+ Descriptive text analysis

---



# **1. `f-strings` in Python**

`f-strings` provide a convenient way to embed Python expression or objects within the `print` statements. We will use this quite a lot throughout the course




In [None]:
# Simple print statement

print('Hello, World!')


In [None]:
# By default, Python does not allow to embed object within print statements 

greeting = 'Hello, World!'

print('A journey in programming starts with greeting') # greeting is not recognized as an object

print('A journey in programming starts with: ' + greeting) # you can overcome this by concatenating strings within print statement, but it's not handy

# f-strings allows for a more flexible and readable approach

print(f'A journey in programming starts with: {greeting}')


In [None]:
# f-strings can handle both object and functions

fruits = ['apple', 'banana', 'orange', 'pineapple']

print(f'There are {len(fruits)} fruits in a basket')


# **2. A quick guide on `for` loops, `if-else` statements and functions in Python**

`for` loops are used for iterating a function over a sequence (a list, a string, a dictionary).

`for` loops are helpful when you want to perform **the same** operation on **multiple elements** of an object



In [None]:
# Say you have a string..

string = 'Today is the 9th of March, 2022'

# ..and you want to print each of its characters one by one

# Instead of running something like..
print(string[0])
print(string[1])
print(string[2])
print(string[3])
print(string[4])
#...
print(string[-1])


In [None]:
# ..you can iterate over a string and print out each of its elements consequently

for e in string:
  print(e)

# 'e' denotes 'element', but you can name it whatever you like
# Indentation matters: note that the statement within the loop - print(e) - starts with an indent
   

In [None]:
# for loops work well with lists. Say we break down the string into words

string_words = string.split(" ")

string_words


In [None]:
# Now we can iterate over each element of a list (that is, over each word, and calculate its length)

# But first, let's print them out to make sure that it actually works

for e in string_words:
  print(e)


In [None]:
# Iterating over each element of a list and calculating its length

for e in string_words:
  word_length = len(e) # create a 'word_length' object within a loop that contains its length

  print(word_length) # print it out


In [None]:
# Make everything even more explicit

for e in string_words:
  word_length = len(e)

  print(f'The length of the word \'{e}\' is: {word_length}') # \' ensures that ' is treated merely as a symbol, not as a beginning/end of a string



In [None]:
# Clearly not all the elements in this list are words (e.g. '9th', '2022')

# Let's use if-else statement and .isalpha() method to print out only the length of words

# .islapha() checks if all the characters in the text are letters

# Let's first see how it works:

for e in string_words:

  print(e.isalpha()) # run .isalpha() on each of the elements and print out its output


In [None]:
# More explicitly:

for e in string_words:

  print(f'Is \'{e}\' a word? {e.isalpha()}')


In [None]:
# Finally, this is how you can use if-else statement and .isalpha() method to print out only the length of words

for e in string_words: # for each element..

  word_length = len(e)

  if e.isalpha() == True: # if it is a word.. (same as 'if e.isalpha()')

    print(f'The length of the word \'{e}\' is: {word_length}') # ..print out its length

  else: # if it's not a word..

    print(f'Sorry, \'{e}\' is not a word!') #.. print out this statement


*Now you try!*

Please take the following string: 

*The University of Exeter is a public research university in Exeter, Devon, South West England, United Kingdom.*

+ split it into words via `.split()`
+ for each word, check whether it starts with a capital letter via `.isupper()`
+ **if it does:** print out its uppercase version via `.upper()`
+ **if it doesn't:** print out its capitalzied version via `.capitalize()`

In [None]:
string = 'The University of Exeter is a public research university in Exeter, Devon, South West England, United Kingdom.'



In [None]:
# Split the string into words:

string_words = string.split()

string_words


In [None]:
for e in string_words: # for each element (word) in a list..

  if e[0].isupper(): # if it starts with a capital letter..

      print(e.upper()) # ..print out its uppercase version

  else: # if it does not start with a capital letter..

    print(e.capitalize()) # ..print out its capitalized version


In [None]:
# Same loop but including f-strings:

for e in string_words:

  if e[0].isupper():

      print(f'The word \'{e}\' starts with a capital letter. Its uppercase version is \'{e.upper()}\'')

  else:

    print(f'The word \'{e}\' does not start with a capital letter. Its capitalized version is \'{e.capitalize()}\'')


In [None]:
# Finally, you can wrap up a sequence of steps into a single function

def modificator(text): # define a function called 'modificator' that takes some text (string) as an input

  # Then, within this function...
  string_words = string.split() # split a string into words

  for e in string_words: # for each element (word) in a list..

    if e[0].isupper(): # if it starts with a capital letter..

      print(e.upper()) # ..print out its uppercase version

    else: # if it does not start with a capital letter..

      print(e.capitalize()) # ..print out its capitalized version


In [None]:
# Now run the modificator function on a string that we have defined before

modificator(string)

# You have got the same output as above!


*Now you try!*

Please take the following string: 

*The University of Exeter is a public research university in Exeter, Devon, South West England, United Kingdom.*

Write a function called `long_short` that would count the length of each word and return the phrase:
+ 'This is a SHORT word: *\<word\>*' for those that have **less than 5 characters**
+ 'This is a MIDDLE-LENGTH word: *\<word\>*' for those that have **from 5 to 7 characters**
+ 'This is a LONG word: *\<word\>*' for those that have **8 or more characters**

In [None]:
string = 'The University of Exeter is a public research university in Exeter, Devon, South West England, United Kingdom.'



In [None]:
def long_short(text):

  string_words = string.split() # split a string into words

  for e in string_words: # for each element (word) in a list..

    if len(e) < 5: # if word's length is less than 5 characters..

      print(f'This is a SHORT word: {e}') # ..print out this phrase

    elif len(e) >= 5 and len(e) <= 7: # if word's length is from 5 to 7 characters..

      print(f'This is a MIDDLE-LENGTH word: {e}') # ..print out this phrase

    else: # if word's length is 8 or more characters..

      print(f'This is a LONG word: {e}') # ..print out this phrase


In [None]:
# Run the long_short function on the string

long_short(string)


# **That's the end of Part 1!**