# The Light in French Literature 2 - keyword analysis
## - The use of lighting technologies and sensational words in the same sentence
Which writers make the most and least use of sensational words when writting about lighting technologies?

- how likely is it that a lighting technology is mentioned in a sentens containing “sensational vocabulary”?
- how many lighting technology sentences are there?
- how many of them contain sensational vocab?

    1. split text in sentencens
    2. count sentencens
    3. count sentencens with lighting technology
    4. count sentencens with lighting technology containing “sensational vocabulary”
    5. caculate ratio of total sentencens and sentencens with lighting technology containing “sensational vocabulary”
    6. caculate ratio of sentencens with lighting technology and sentencens with lighting technology containing “sensational vocabulary”
    



In [1]:
import re
import os
import pandas as pd
from pathlib import Path
from tqdm.notebook import tqdm
tqdm.pandas()

# Prepare data
- Data is stored in a dataframe
- in a function we detect presens of keywords in paragraphs. To do this we split the text into paragraphs and check then one by for a keyword. If a keyword is detected in a paragraph we do not continue to search for other keywords in the same paragraph. We just register, that a keyword is found in a paragraph and moves on to the next paragraph.

In [4]:
input_dir = Path.cwd() / '../data/csv_files' # path of files to be found
csv_files = os.listdir(input_dir)
print (f'The file name is {csv_files[-1]}')
df = pd.read_csv(input_dir / csv_files[-1], sep='|')



# split text in paragraphs and count paragraphs
def count_sentences(text):
    sent = text.split('.')
    sent = [s.strip() for s in sent]
    return len(sent)

print ('\n\nCount paragraphs:\n')
df['count_sentences'] = df['text'].progress_apply(lambda text : count_sentences(text))


#### Read two lists of keywords - technology and emotion keyowrds ####

# path to keyword lists directory 
input_dir = Path.cwd() / '../data/key_word_lists' 


# get the tech words
key_word_file_name = 'technology_list.txt'
with open(input_dir / key_word_file_name, 'r', encoding='utf-8-sig') as file:
    tech_key_words = file.read().split('\n')
    
# get the emo words    
key_word_file_name = 'sensation_list.txt'
with open(input_dir / key_word_file_name, 'r', encoding='utf-8-sig') as file:
    emo_key_words = file.read().split('\n')



    
### Build function to do the teck key word counting
def count_kw(text, key_word_list):
    text = text.lower()
    sentences = text.split('.')
    list_of_sents = [s.strip() for s in sentences]
    
    count = 0
    # take paragraphs one by one
    for sent in list_of_sents:
        # take every word in the list of keywords
        for key_word in key_word_list:
            # if a keyword is in the paragraph return boolean
            if re.search(key_word, sent):
                # then add one to the counter 
                count = count + 1
                # and then break out and return to the beginning of the loop 
                break
    return count

def count_kw_in_kw_sent(text, key_word_list1, key_word_list2):
    text = text.lower()
    sentences = text.split('.')
    list_of_sents = [s.strip() for s in sentences]
    
    count = 0
    # take paragraphs one by one
    for sent in list_of_sents:
        
        # take every word in the list of tech words
        for key_word in key_word_list1:
            # if a word from the list is in the paragraph
            if re.search(key_word, sent):
                for key_word in key_word_list2:
                    if key_word in sent:
                        # then add one to the counter 
                        count = count + 1
                        # and then break out and return to the beginning of the loop 
                        break
    return count



######### Count keyword in paragraphs ##### 

print ('\n\nCount sentences with tech keywords:\n')    
df['s_lightning_kw'] = df['text'].progress_apply(lambda text : count_kw(text, tech_key_words))

######### Count paragraphs with lighting technology containing “emotion vocabulary”
print ('\n\nCount tech sentences that holds sensation keywords:\n') 
df['s_lightning_emotion_kw'] = df['text'].progress_apply(lambda text : count_kw_in_kw_sent(text, tech_key_words, emo_key_words))


The file name is text_data230826.csv


Count paragraphs:



  0%|          | 0/101 [00:00<?, ?it/s]



Count sentences with tech keywords:



  0%|          | 0/101 [00:00<?, ?it/s]



Count tech sentences that holds sensation keywords:



  0%|          | 0/101 [00:00<?, ?it/s]

In [46]:
# get relative data
new_df = df.copy()
new_df['lightning technology sentence share of all sentences'] = round(new_df['s_lightning_kw'] / new_df['count_sentences'], 3) 
new_df['relative value'] = round(new_df['s_lightning_emotion_kw'] / new_df['count_sentences'], 3) 
new_df['sen and lightning technology sentence share of lightning technology sentences'] = round(new_df['s_lightning_emotion_kw'] / new_df['s_lightning_kw'], 3)

new_df = new_df.iloc[:, [0, 1,2,3,6,7,8,9,10]]

In [47]:
import plotly.express as px

In [50]:
x_var = new_df.index + 1
y_var = 'relative value'


fig = px.bar(new_df.sort_values(by=y_var, ascending=False), 
            x = x_var, 
            y = y_var,
            hover_data=['author','title', 'year', 'relative value'])
             
# Update x-axis label
fig.update_xaxes(title_text='Document in chronological order')

# Update y-axis label
fig.update_yaxes(title_text='Relative value') 
print ('\nThe presens of sentences with lighting technologies and sensational words.\n'\
             'We have the relative frequency of those sentences compared to all sentences in each novel.')
fig.show()


The presens of sentences with lighting technologies and sensational words.
We have the relative frequency of those sentences compared to all sentences in each novel.


In [51]:
import plotly.io as io

html_snippet_start = '<!DOCTYPE html> <html> <head> <title>Title</title> </head> <body>' 
html_snippet_end = ' </body></html> '

html_as_string = io.to_html(fig, full_html=False)

vis_in_html = html_snippet_start + html_as_string + html_snippet_end

of = open(r'C:\Users\lakj\Documents\GitHub\Lighting in French Literature\visualisations\emo_in_light_sent1.htm', 'w', encoding='utf-8-sig')
of.write(vis_in_html)

3695987

# Keyword in context - or find a text snippet based on keywords and a range

We want to find a word for example 'lumière' as well as words that are related to the word, and we have to have some context because we are actually interested in pointing down the text and seeing exactly how lumière is used.

For this we need to use \w. because it gives us more word characters and {30} checks that we get 30 word characters before we hit the letters lumière. \b in front of lumière searches so that we only find words that begin with lumière and not words where lumière is part of the word, e.g. looking. After lumière, \w.{30} searches for another 30 word characters.

The pipe , | , means 'or' and alow us to get more than one word in the same search.

Below I choose to look for contexts in texts of the author Balzac.

In [21]:
text = ' '.join(df[df['author'] == 'Balzac']['clean_text'])
import re
context = re.findall(r'.{0,50}\blampe.{0,40}|.{0,50}\blustre.{0,40}', text)
context

['a leur des aladins qui se laissent emprunter leur lampe ces admirables conseillers ont l’esprit',
 'carafe pleine d’eau et un verre et de l’autre une lampe le juge sonna l’huissier vint après que',
 'illotait dans les bas-reliefs en donnant tout son lustre à ce chef-d’œuvre des artisans du seizi',
 'rdonnance de napoléon en allumant son cigare à la lampe du portier quand joseph expliqua la pos',
 'es dorées étaient jaspées de vert-de-gris un joli lustre moitié cristal moitié en fleurs de porc',
 'e ménage elle frotta les meubles leur rendit leur lustre et tint tout au logis dans une propreté',
 'le madame hulot dit-il et il montrait une vieille lampe un lustre dédoré les cordes du tapis en',
 'tatuettes de plâtre jouant le bronze florentin le lustre mal ciselé simplement mis en couleur à ',
 'n elle trouva travaillant à la lueur d’une petite lampe dont la clarté s’augmentait en passant ',
 ' aidé sa femme à nettoyer les meubles à rendre du lustre aux plus petits objets en savonnant e