Code taken from https://colab.research.google.com/drive/1TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh#scrollTo=nNT03tbV_Rqk

In [1]:
from pathos.multiprocessing import ProcessPool
from textblob import TextBlob
from tqdm import tqdm 

# Create a process pool

Pools are a group of poccesses where you will send tasks. Inside you will define the number of processes to create. By defualt it will be number of CPU cores, however you can define more than that.

Scheduling more processes than you have CPU cores can increase performance where the processes run into wait times or I/O

In [2]:
pool = ProcessPool(nodes=3)

# the creation of a process pool with a specified number of nodes, in this case, 3. 
# the pool will have 3 worker processes available for parallel execution.

# Functions

Map methods provided:
    
map         - blocking and ordered worker pool        [returns: list]

imap        - non-blocking and ordered worker pool    [returns: iterator]

uimap       - non-blocking and unordered worker pool  [returns: iterator]

amap        - asynchronous worker pool                [returns: object]

Blocking: handles jobs in batches rather than 1 by 1

Ordered: Batches must be completed in order

In [4]:
#pool.map(function to run, data to run it on, other arguments )

pool.map(pow, [1,2,3,4], [5,6,7,8])

# The map() method applies the pow() function to pairs of values from the input lists, r
# esulting in a new list of the calculated values. 
#Specifically, it computes pow(1, 5), pow(2, 6), pow(3, 7), and pow(4, 8).

[1, 64, 2187, 65536]

In [5]:
#Iterate through the returned data using imap
for x in pool.imap(pow, [1,2,3,4], [5,6,7,8]):
    print(x)
    
    
#The imap() method returns an iterator that yields the results as they become available. 
# It does not wait for all the results to be computed before yielding the first result. 
# Instead, it yields each result as soon as it is ready.


#The map() function in multiprocessing typically waits until all the results are available before returning 
# the final list of results.

1
64
2187
65536


In [8]:
# do an asynchronous map, then get the results
import time

results = pool.amap(pow, [1,2,3,4], [5,6,7,8])
while not results.ready():
    time.sleep(5); print(".", end=' ')

# Retrieve the results as a list
output = results.get()
# Print the output
print(output)

. [1, 64, 2187, 65536]


# Build your function

First lets build a function that can take a line of text and produce the sentiment

# Then we will need a function that will download the poems for us


# Let's check out what one of these poems look like

In [19]:
#Serial Processing
#score = []

#for url in tqdm(urls, position=0 ): #position=0 forces the bars into the same line when printing
#    score += process_poem(url)


In [20]:
#Print the scores

#print(score[:10])

In [21]:
#score = []

#for score in tqdm(pool.uimap(process_poem, urls), total=len(urls), position=0):
#    score += score

# Parallel processing

pool = ProcessPool(nodes=3)