# Hi!
##### A little bit of context: I was asked if it was possible to get Google Trend's "interest" values for different companies across a period of time. While googling for a way to do this, I came across the "pytrends" library (https://pypi.org/project/pytrends/) which allows you to pull data from Google Trends using Python, with certain conditions of course. 

##### A main limitation in our task at hand, was that the list of keywords meant to look for on Google Trends was too big, as you can only provide 5 keywords per search. Not only this, the "interest scores" provided are regularized values in the context of the keywords that you provide, ranging from 0 to 100. This means that the resulting score for a particular keyword -could- vary according to the different combination of keywords provided. Therefore, a main osbtacle was finding a workaround for a list of hundreds of words.

##### Through several trial and errors trying to understand how the different results could vary, I noticed that we could achieve a certain degree of consistency if we searched for a term that had relatively high scores, as it would serve as a "reference" in the scaling of the other scores for the remaining terms, kind of like an anchor. In our particular case (based on my trial and errors) I noticed that a term that was giving consistent scores for the other terms in the keyword list was "Facebook", so I decided to have it consistently in my keywords list as the "score anchor". The remaining challenge was to loop the keywords and assemble a hierarchy of companies based on their average interest score. The following code aims to solve this situation.

##### Please keep in mind that we can only make a certain amount of requests continously with pytrends, as with an excessive amount we would start looking like a bot and get temporarily blocked with a "429 error". So if your list contains a lot of elements as well, it is likely that you may want to split your data, or break the process in different steps on different occassions.  

# Alright then! 

##### The first thing we will do is install pytrends and run a sample request using the "TrendReq()" function. We will then run an "interest over time" request for our desired keyword list.

###### Please remember that you may find more details regarding pytrends here: https://pypi.org/project/pytrends/

In [2]:
#install pytrends
!pip install pytrends
#import the libraries
import pandas as pd                        
from pytrends.request import TrendReq
pytrend = TrendReq()
#provide your search terms
kw_list=['Facebook', 'Apple', 'Amazon', 'Netflix', 'Google']
#get interest by region for your search terms
pytrend.build_payload(kw_list=kw_list)
df = pytrend.interest_over_time()
df



Unnamed: 0_level_0,Facebook,Apple,Amazon,Netflix,Google,isPartial
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-03-04,96,6,18,7,68,False
2018-03-11,100,6,18,7,68,False
2018-03-18,96,6,18,7,69,False
2018-03-25,99,7,17,8,64,False
2018-04-01,94,6,18,8,64,False
...,...,...,...,...,...,...
2023-01-29,31,7,20,9,57,False
2023-02-05,30,7,19,9,54,False
2023-02-12,32,7,19,8,55,False
2023-02-19,32,7,19,9,53,False


In [3]:
#Now let's ingest the company list into the notebook. I purposely created a much shorter version in the "small sample" sheet for illustration purposes.
df = pd.read_excel('listado_empresas.xlsx', sheet_name='small_sample')

#Let's just call the list object as its Spanish translation "lista", quite creative I am aware
lista = df.Nombre.tolist()

#Checking the content of "lista"
print("The elements of the list are: ", lista)

#Getting the length of the "lista" object, as it will be useful for looping later.
lista_len = len(lista)

#Now we are going to define a function called "chunks" which will help us split the list content into sub-chunks of lists in order to be able 
#to iterate over them.
#Disclaimer: The remaining code for this cell is not of my creationg, I found it here when I got stuck at the start: 
#https://github.com/GeneralMills/pytrends/issues/485
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), 3):
        yield lst[i:i + n]

#Now let's create a "grupos" object (Spanish for groups, I know I know) which will make a big list formed of all the small lists from "chunks".
#For each of the sublists, we will be adding "Facebook", as it will help us gain consistency in the scores.
grupos = [ ['Facebook'] + lst  for lst in list(chunks(lista, 3))]
print("The grupos object consists of: ", grupos)

The elements of the list are:  ['Nike', 'Google', 'Apple', 'Amazon', 'Microsoft', 'Netflix', 'Nestle']
The grupos object consists of:  [['Facebook', 'Nike', 'Google', 'Apple'], ['Facebook', 'Amazon', 'Microsoft', 'Netflix'], ['Facebook', 'Nestle']]


In [4]:
#Now let's create a DataFrame containing only the scores for Facebook. We will call this our "main dataframe" as it will help us anchor the other results.  
#We are using parameters on TrendReq to adjust the search of interest inside the United States with a time zone of CST.
pytrends = TrendReq(hl='en-US', tz=360)
#Our only keyword will be "Facebook" for the moment.
kw_list = ['Facebook']
#We are passing our keyword list as well as our category number corresponding for "Business Finance" (https://github.com/pat310/google-trends-api/wiki/Google-Trends-Categories)
#We are also pulling scores from the last 5 years.
pytrends.build_payload(kw_list, cat=112, timeframe='today 5-y')
main_df = pytrends.interest_over_time()

In [5]:
main_df

Unnamed: 0_level_0,Facebook,isPartial
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-03-04,71,False
2018-03-11,79,False
2018-03-18,68,False
2018-03-25,77,False
2018-04-01,74,False
...,...,...
2023-01-29,35,False
2023-02-05,40,False
2023-02-12,40,False
2023-02-19,41,False


In [6]:
#Let's also remove the "isPartial" column
main_df.drop('isPartial', inplace=True, axis=1)

In [7]:
#We begin by creating a list called "hierarchy" which will start with "Facebook", as we are assuming it to be our "anchor" value that is higher than 
#the rest.
hierarchy = ['Facebook']

#Now we will iterate over the amount of elements in the list. As we will start comparing the scores of each of them, get the highest one, add it to 
#the hierarchy list, and then run again with the remaining elements. Thus making sure we are creating a list of the companies with the highest average
#interest score. Notice how this isn't necessary in this small sample case, but if you wish to solve this situation for a big amount of terms, you 
#will probably have to split the list and to this on different occassions, as Google can only provide you with a certain amount of requests and it 
#could very well be impossible to finish this in one go for different elements.
for l in range(lista_len):
    #we need to run the grupos object inside of the loop, as we will be updating the list elements, and therefore grupos too.
    grupos = [ ['Facebook'] + lst  for lst in list(chunks(lista, 3))]
    #getting each individual sub list from grupos...
    for g in grupos:
        #passing the sublist as a keyword element...
        kw_list = g
        pytrends.build_payload(kw_list, cat=112, timeframe='today 5-y')
        #getting dataframe with interest scores over time
        df = pytrends.interest_over_time()
        #now we are adding each element from the dataframe into our original main_df
        for elemento in range(len(g)):
            main_df[g[elemento]] = df.iloc[:,elemento]
            #print("main_df: ",main_df.head(2))
    #run this in case you want to check over "g"
    #print("g: ", g)
    #now we are getting the average values of the interest scores for each column (therefore each element)
    #we sort them in descending order so we can pick our elements with the highest averages for the hierarchy
    #we also pick the second largest, as the first one will always be "Facebook"
    max_result = main_df.mean().sort_values(ascending=False).index[1]
    #Just if you want to check how the values behave...
    print("max_result: ", max_result)
    print("hierarchy pre append: ", hierarchy)
    #We append the max value that we gor into the list
    hierarchy.append(max_result)
    print("hierarchy post append: ", hierarchy)
    #and remove it from the original list, so we can start again without our recently appended value and look for the next one in the hierarchy
    lista.remove(max_result)
    pytrends = TrendReq(hl='en-US', tz=360)
    #now we need to "reset" main_df so we can go another round with our updated list and keep fulling the hierarchy
    kw_list = ['Facebook']
    pytrends.build_payload(kw_list, cat=112, timeframe='today 5-y')
    main_df = pytrends.interest_over_time()
    main_df.drop('isPartial', inplace=True, axis=1)

max_result:  Nike
hierarchy pre append:  ['Facebook']
hierarchy post append:  ['Facebook', 'Nike']
max_result:  Google
hierarchy pre append:  ['Facebook', 'Nike']
hierarchy post append:  ['Facebook', 'Nike', 'Google']
max_result:  Amazon
hierarchy pre append:  ['Facebook', 'Nike', 'Google']
hierarchy post append:  ['Facebook', 'Nike', 'Google', 'Amazon']
max_result:  Netflix
hierarchy pre append:  ['Facebook', 'Nike', 'Google', 'Amazon']
hierarchy post append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix']
max_result:  Apple
hierarchy pre append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix']
hierarchy post append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix', 'Apple']
max_result:  Microsoft
hierarchy pre append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix', 'Apple']
hierarchy post append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix', 'Apple', 'Microsoft']
max_result:  Nestle
hierarchy pre append:  ['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix', 'Ap

In [8]:
#And finally we can check on hour hierarchy list of average intensity scores for our desired keyword list!
print(hierarchy)

['Facebook', 'Nike', 'Google', 'Amazon', 'Netflix', 'Apple', 'Microsoft', 'Nestle']


##### I am aware that this may be overkill for our small sample, but in the case that you have a huge amount of elements and get blocked, it would be useful to run this code on small iterations (changing the <<range(lista_len)>> part at the start of the loop for small and increasing values) so that you can still get the hierarchy even if you run the code on different occations.