# Explore Topic Models
Explores the topic models of forum posts with LDA (Latent Dirichlet Allocation)

## Data Sources
- topicmodel (created with 4-Generate_Topic_Models.ipynb)
- sentiments (created with 2-Sentiment_Analysis.ipynb)

## Changes
- 2020-09-17: Created

## TODO
- Tutorial
 - https://towardsdatascience.com/topic-modelling-in-python-with-nltk-and-gensim-4ef03213cd21
 - https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python


## Imports

In [1]:
import sqlite3
import pandas as pd
from pathlib import Path

## Functions

In [2]:
def create_connection(db_file):
    """ create a database connection to the SQLite database
        specified by the db_file
    :param db_file: database file
    :return: Connection object or None
    """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
    except Error as err:
        print(err)
    return conn

## File Locations

In [3]:
p = Path.cwd()
path_parent = p.parents[0]
path_db = path_parent / "database" / "youbemomTables.db"
path_db = str(path_db)

## Load and Merge Data

In [6]:
conn = sqlite3.connect(path_db)
sentiments = pd.read_sql_query("SELECT * from sentiments", conn)
topicmodel = pd.read_sql_query("SELECT * from topicmodel", conn)
df = pd.concat([sentiments, topicmodel], axis=1)
df.info()
topic_text = df[['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords', 'text']]
topic_text.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27227 entries, 0 to 27226
Data columns (total 22 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   id                  27227 non-null  int64  
 1   family_id           27227 non-null  int64  
 2   message_id          27227 non-null  object 
 3   parent_id           27227 non-null  object 
 4   date_recorded       27227 non-null  object 
 5   date_created        27227 non-null  object 
 6   title               27227 non-null  object 
 7   body                27227 non-null  object 
 8   text                27227 non-null  object 
 9   before              27227 non-null  int64  
 10  during              27227 non-null  int64  
 11  march               27227 non-null  int64  
 12  period              27227 non-null  object 
 13  is_parent           27227 non-null  int64  
 14  neg_sentiment       27227 non-null  float64
 15  neu_sentiment       27227 non-null  float64
 16  pos_

## Most Representative Post for Each Topic

In [12]:
topic_dominant = pd.DataFrame()
topic_grouped = topic_text.groupby('Dominant_Topic')
for i, grp in topic_grouped:
    topic_dominant = pd.concat([topic_dominant,
                                grp.sort_values(['Perc_Contribution'],
                                                ascending=[0]).head(1)],
                               axis=0)
topic_dominant.reset_index(drop=True, inplace=True)
topic_dominant.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Dominant_Topic     7 non-null      float64
 1   Perc_Contribution  7 non-null      float64
 2   Topic_Keywords     7 non-null      object 
 3   text               7 non-null      object 
dtypes: float64(2), object(2)
memory usage: 352.0+ bytes


In [13]:
for t in topic_dominant['text']:
    print(t)

Write to the school board asking for a meeting for an IEP. Those psycho-social reports will support what is needed. 
HD NEws https://joshuaruiz2live.com/ https://ruizvsjoshua2.co/ https://livevsushd.com/cowboysvsbears/ https://livevsushd.com/patriotsvschiefs/ https://livevsushd.com/eaglesvsgiants/ https://livevsushd.com/dubairugby7s/ https://livevsushd.com/dubairugbysevens/ https://livevsushd.com/nationalfinalsrodeo/ https://livevsushd.com/nfrlivestream/ https://livevsushd.com/presidentscup/ https://livevsushd.com/ruizvsjoshuarematch2/ https://livevsushd.com/anthonyjoshuavsandyruizjr2/ https://livevsushd.com/ufc245/ https://presidentscup2019live.com/ https://golfpresidentscup.co/ https://jumanjithenextlevel.co/ https://ufc245live.co/
don't hand him the pill. Get one of the those 7 day pill boxes, and put the alarm on his phone. Keep the pill box next to his toothbrush. If he doesn't take it at 7pm, maybe he'll remember when he sees it at bedtime. (I also have a 17yo with adhd who takes