# Focused Coding
___

## Table of Content

1. [Libraries](#libraries)
2. [Data Preprocessing](#preprocessing-data-and-grouping)
3. [Focused Coding Helper](#focused-coding----sample-helper)
___

## Libraries

All libraries which are needed to execute the code are listed here. Install the packages by using the `requirements.txt` file. 

The documentation can be found in the [README.md](README.md) file.

In [3]:
# import packages
import pandas as pd 
import os
import numpy as np
from tqdm import tqdm
from nltk.tokenize import TweetTokenizer
import nltk
from preprocessing_functions import *

## Preprocessing Data and Grouping

In [4]:
# load data
df = pd.read_csv('data/comments_final.csv')
df.head(3)

Unnamed: 0,video_id,published_at,like_count,text,author
0,uW6fi2tCnAc,2023-02-19T21:22:45Z,1,"The answer is if China and India don't help, i...",0.0
1,uW6fi2tCnAc,2023-02-19T00:43:40Z,2,"and that guy is an expert, we're screwed",1.0
2,uW6fi2tCnAc,2023-02-18T22:57:38Z,4,Kennedy is a gem.,2.0


In [5]:
# group data by author and see distribution of comments 
df['author'] = pd.to_numeric(df['author'], errors='coerce').astype('Int64')
summary = df.groupby('author').agg(
    count=('author', 'size'),
    unique_video_id_count=('video_id', 'nunique')
).reset_index()
summary.sort_values(by='count', ascending=False, inplace=True)

# print top 20 authors
latex_table = summary.head(20).to_latex(index=False)
print(latex_table)

\begin{tabular}{rrr}
\toprule
author & count & unique_video_id_count \\
\midrule
17605 & 206 & 206 \\
3743 & 188 & 124 \\
18119 & 166 & 161 \\
17604 & 155 & 138 \\
18380 & 154 & 154 \\
17977 & 142 & 138 \\
17906 & 139 & 115 \\
17676 & 135 & 103 \\
17755 & 134 & 134 \\
14017 & 129 & 126 \\
14057 & 128 & 123 \\
2610 & 125 & 125 \\
25732 & 116 & 88 \\
1645 & 107 & 62 \\
17581 & 106 & 104 \\
6165 & 103 & 52 \\
6 & 99 & 98 \\
1106 & 97 & 75 \\
17608 & 92 & 80 \\
17590 & 87 & 79 \\
\bottomrule
\end{tabular}



In [6]:
# extract original text for seeing in comparising capitalized words etc.
extracted_col = df["text"]

# process data with using functions from functions.py
processed_df = (
    df.pipe(remove_users, 'text')
      .pipe(lowercase_text, 'text')
      .pipe(remove_whitespace, 'text')
      .pipe(remove_punctuation, 'text')
)

In [7]:
# Add the extracted column to the second DataFrame
processed_df = pd.concat([processed_df, extracted_col.rename("og_text")], axis=1)
processed_df.head(3)

Unnamed: 0,video_id,published_at,like_count,text,author,og_text
0,uW6fi2tCnAc,2023-02-19T21:22:45Z,1,the answer is if china and india dont help it ...,0,"The answer is if China and India don't help, i..."
1,uW6fi2tCnAc,2023-02-19T00:43:40Z,2,and that guy is an expert were screwed,1,"and that guy is an expert, we're screwed"
2,uW6fi2tCnAc,2023-02-18T22:57:38Z,4,kennedy is a gem,2,Kennedy is a gem.


In [8]:
# use lemmatization to reduce words to their root form
processed_df['text'] = processed_df['text'].astype('str')
processed_df = lemmatize_words(processed_df, 'text')

In [9]:
processed_df.lemmatized_text = processed_df.lemmatized_text.apply(lambda x: '' if str(x) == 'nan' else x)

## Focused Coding -- Sample helper

this helper is written to find comments with important keywords we could firstly find through exploring the comments or through topic modeling and Word2Vec.

How does it work?

- Input: Insert a lemmatized keyword of interest or multiple in `substrings` or `string`
- Ouput: Get comments as output which are talking about those keywords to get a variety of comments based on that topic or word of interest.

In [14]:
substrings = ['god plan', 'greenhouse gas', 'natural cycle', 'hoax']
string = ['climate model']
pattern = '|'.join(string)

In [15]:
filtered_df = processed_df[processed_df['lemmatized_text'].str.contains(pattern, case=False, na=False)]
pd.set_option('display.max_colwidth', None)
print(len(filtered_df))
filtered_df.sample(n=10)

202


Unnamed: 0,video_id,published_at,like_count,text,author,og_text,lemmatized_text
15320,7LVSrTZDopM,2023-02-02T16:18:31Z,2,would have been interesting to hear what dr lindzen thinks about copernicus satellite data with the amount of different measurements and level of detail to calibrate any digital model the uncertainty margins that are included in current digital models and have been decreasing over the years based on these data the fact that currently there is more mass created by men than biological mass bacteria plants animals on earth if buffers to absorb max and min temperatures and humidity like in forests especially rainforest would exist research papers that say some species are close to extinction why the amount of flooding events and droughtfires seems so much higher over the last decade to me it is also clear that politics has oversimplified the climate model by just focusing on one of the many greenhouse gasses and ignoring buffers or other disturbing processes but there is certainly effect of what we western humans have been doing during the last century e i think rich people will survive by climate engineering but many animals plants and other parts of earths beauty will be destroyed not quite something to be proud of id say,12701,"Would have been interesting to hear what Dr. Lindzen thinks about:\n-Copernicus satellite data with the amount of different measurements and level of detail to calibrate any digital model\n-The uncertainty margins that are included in current digital models and have been decreasing over the years, based on these data\n-The fact that currently there is more mass created by men than biological mass (bacteria, plants, animals) on Earth\n-If buffers to absorb max. and min. temperatures and humidity, like in forests (especially Rainforest) would exist\n-Research papers that say some species are close to extinction\n-Why the amount of flooding events and drought/fires seems so much higher over the last decade\n\nTo me it is also clear that politics has oversimplified the climate model by just focusing on one of the many greenhouse gasses and ignoring buffers or other disturbing processes, but there is certainly effect of what we, (Western) humans, have been doing during the last century. E \nI think rich people will survive by climate engineering, but many animals, plants and other parts of Earth's beauty will be destroyed. Not quite something to be proud of, I'd say.",would have be interest to hear what dr lindzen think about copernicus satellite data with the amount of different measurement and level of detail to calibrate any digital model the uncertainty margin that be include in current digital model and have be decrease over the year base on these data the fact that currently there be more mass create by men than biological mass bacteria plant animal on earth if buffer to absorb max and min temperature and humidity like in forest especially rainforest would exist research paper that say some specie be close to extinction why the amount of flood event and droughtfires seem so much high over the last decade to me it be also clear that politics have oversimplify the climate model by just focus on one of the many greenhouse gas and ignore buffer or other disturbing process but there be certainly effect of what we western human have be do during the last century e i think rich people will survive by climate engineering but many animal plant and other part of earths beauty will be destroy not quite something to be proud of id say
25960,niZdc2BEuqY,2024-05-11T00:12:35Z,0,all the climate models are bogus none can replicate the historic climate record because they all think co2 drives the climate,18017,All the climate models are bogus. None can replicate the historic climate record (because they all think CO2 drives the climate!),all the climate model be bogus none can replicate the historic climate record because they all think co2 drive the climate
62574,U0Xqu6BJHkk,2023-11-15T17:22:43Z,2,being agnostic is not prudent here sure no one can predict the climate but all rational persond should instead focus on what we absolutely know namely that the climate models are not and can not be good evidence for alarm this is the issue after the gretards have no clothes,14036,"Being agnostic is not prudent here. Sure.. no one can predict the climate but all rational persond should instead focus on what we absolutely know. Namely that the climate models are not, and can not be, good evidence for alarm. This is the issue after. The Gretards have no clothes.",be agnostic be not prudent here sure no one can predict the climate but all rational persond should instead focus on what we absolutely know namely that the climate model be not and can not be good evidence for alarm this be the issue after the gretards have no clothes
16689,7LVSrTZDopM,2023-01-11T21:21:15Z,0,co2 is a much smaller contributor to warming than water vapor climate models by the ipcc are terminally flawed the warming is primarily driven by the solar flux changes combined with earths orbital changes and axial processions earths magnetic field is slowly weakening as it prepares to swap solar winds now deposit more energy into the troposphere cloud formation is natural feedback increasing aldebo keeping temperatures lower co2 is far weaker than all of that and it lags temperature increases since the oceans will give up the gas as they warm humans contribute less than 5 of all co2 volcanos contribute as much as humans and ocean thermal vents and under ocean volcanos contribute more than that while the land and oceans contribute 90 most warming is not anthropogenic they best studies with back testing show this say 4 more co2 which is logarithmic in its ir absorption cant possibly contribute enough,8940,"CO2 is a much smaller contributor to warming than water vapor. Climate models by the IPCC are terminally flawed. The warming is primarily driven by the solar flux changes combined with earth's orbital changes and axial processions. Earth's magnetic field is slowly weakening as it prepares to swap, solar winds now deposit more energy into the troposphere. Cloud formation is natural feedback increasing aldebo keeping temperatures lower. CO2 is far weaker than all of that and it lags temperature increases since the oceans will give up the gas as they warm. Humans contribute less than 5% of all CO2. Volcanos contribute as much as humans and ocean thermal vents and under ocean volcanos contribute more than that while the land and oceans contribute 90+ %. Most warming is NOT anthropogenic. They best studies with back testing show this. Say, 4% more CO2 which is logarithmic in its IR absorption can't possibly contribute enough.",co2 be a much small contributor to warm than water vapor climate model by the ipcc be terminally flaw the warming be primarily drive by the solar flux change combine with earths orbital change and axial procession earth magnetic field be slowly weaken a it prepare to swap solar wind now deposit more energy into the troposphere cloud formation be natural feedback increase aldebo keep temperature low co2 be far weak than all of that and it lag temperature increase since the ocean will give up the gas a they warm human contribute less than 5 of all co2 volcano contribute as much a human and ocean thermal vent and under ocean volcano contribute more than that while the land and ocean contribute 90 most warming be not anthropogenic they best study with back test show this say 4 more co2 which be logarithmic in it ir absorption cant possibly contribute enough
50565,Wq5YUhpmnPc,2024-01-09T03:07:08Z,95,this is how i feel about the worthless climate models noaa uses “it doesnt matter how beautiful your theory is it doesnt matter how smart you are if it doesnt agree with experiment its wrong” ― richard p feynman,3743,"This is how I feel about the worthless climate models NOAA uses:\n\n“It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong.”\n\r\n― Richard P. Feynman",this be how i feel about the worthless climate model noaa use “ it doesnt matter how beautiful your theory be it doesnt matter how smart you be if it doesnt agree with experiment it wrong ” ― richard p feynman
36285,fBZIM4rNq78,2023-03-26T18:18:41Z,6,i spent 8 years as a meteorologist then another 12 years as a senior simulation analyst do you know what a climate model is it is a vast simplification of the real world you feed your model your simplistic assumptions and then it regurgitates a solution that seems “credible” to the model developer note the word credible credible means believable credible does not mean the truth climate models don’t create truth or valid climate predictions they just deliver credible believable to the climatologist results the same holds true with virus propagation modeling no truth just credible results based on somebodys spurious opinion,18738,"I spent 8 years as a meteorologist, then another 12 years as a senior simulation analyst. Do you know what a climate model is? It is a vast simplification of the real world. You feed your model your simplistic assumptions and then it regurgitates a solution that seems “credible” to the model developer. Note the word credible. Credible means believable. Credible does not mean the truth. Climate models don’t create truth or valid climate predictions; they just deliver credible (believable to the climatologist) results. \r\nThe same holds true with virus propagation modeling. No truth, just credible results based on somebody's spurious opinion.",i spend 8 year a a meteorologist then another 12 year a a senior simulation analyst do you know what a climate model be it be a vast simplification of the real world you fee your model your simplistic assumption and then it regurgitate a solution that seem “ credible ” to the model developer note the word credible credible mean believable credible do not mean the truth climate model don ’ t create truth or valid climate prediction they just deliver credible believable to the climatologist result the same hold true with virus propagation model no truth just credible result base on somebody spurious opinion
51911,NlNm4OBuEWo,2024-01-30T14:02:15Z,9,i ran across her a while ago when she discussed some random thing i thought was interesting but she made herself impossible to take serioulsy when she addressed the issue of climate being a chaotic system that can not be modeled which creates a problem for climate alarmists because climate models can not make predictions about climate change because they all share a major technical glitch which is that hey dont exist so presents this pseudomodel which creates a threateninglooking whirly sort of readout which she asserts allows us to infer that theres going to be temperature rise which will cost boatloads of money as if the people who cant make computer models of climate can make economic predictions any more than they can make climate predictions,25362,"I ran across her a while ago when she discussed some random thing I thought was interesting, but she made herself impossible to take serioulsy when she addressed the issue of climate being a chaotic system that can not be modeled, which creates a problem for climate alarmists because climate models can not make predictions about climate change, because they all share a major technical glitch, which is that hey don't exist. So presents this pseudo-model which creates a threatening-looking whirly sort of readout, which she asserts allows us to infer that there's going to be temperature rise, which will cost boatloads of money, as if the people who can't make computer models of climate, can make economic predictions any more than they can make climate predictions.",i run across her a while ago when she discuss some random thing i think be interest but she make herself impossible to take serioulsy when she address the issue of climate be a chaotic system that can not be model which create a problem for climate alarmist because climate model can not make prediction about climate change because they all share a major technical glitch which be that hey dont exist so present this pseudomodel which create a threateninglooking whirly sort of readout which she assert allow u to infer that theres go to be temperature rise which will cost boatload of money a if the people who cant make computer model of climate can make economic prediction any more than they can make climate prediction
56365,BmdjUYYNeSI,2023-08-29T13:01:31Z,1,i wonder if you have seen sabine hossenfelders explanation of anthropogenic climate change she mentions stratospheric cooling being a prediction of climate models and a nobel prize for the prediction i didnt understand the significance of this and have watched it more than once,22102,"I wonder if you have seen Sabine Hossenfelder's explanation of anthropogenic climate change, she mention's Stratospheric cooling being a prediction of climate models (and a Nobel prize for the prediction) i didn't understand the significance of this and have watched it more than once?",i wonder if you have see sabine hossenfelders explanation of anthropogenic climate change she mention stratospheric cool be a prediction of climate model and a nobel prize for the prediction i didnt understand the significance of this and have watch it more than once
57847,sMn5i8FX0xU,2023-09-27T07:14:51Z,6,2938 if you add up all the error bars in all the climate models the error is 8 degrees c to 8 degrees c out to 2100,25760,29:38. If you add up all the error bars in all the climate models the error is +8 degrees C to -8 degrees C out to 2100,2938 if you add up all the error bar in all the climate model the error be 8 degree c to 8 degree c out to 2100
36106,cI1utwowtoY,2023-09-26T06:54:21Z,0,theres a channel which features space weather on youtube its called smasho mash and ive been following it for a few years the owner is a solar scientist and despite many claiming that the current solar cycle 25 is the minimum hes been observing that the last cycle solar cycle 24 was actually the minimum and the data bears this out as we are already seeing far more sunspots than should be seen if this was a minimum what this means is that the 20 year pause in temperatures was actually down to the advent of the solar minimum between december 2008 and the end in december 2019 decreased solar activity also meant that there was less solar wind and coronal mass ejections and this allowed the cosmic rays to increase the amount of cloud cover which cooled the planet through cloud condensation nuclei allowing more clouds to form its really very easy to see that carbon dioxide is not the only thing that changed since the little ice age despite claims to the contrary the increase of solar activity since the minimums over the last 600 years accounts for all the global warming that weve seen carbon dioxide plays a minuscule part in the climate when compared to water vapor and solar activity just recently there was a video on channel american thought leaders called nobel laureate john clauser climate models miss key variable and he backs up that the missing piece of the climate models is clouds who would have thought that william wordsworth was a climate scientist,12011,"There's a channel which features Space Weather on YouTube. It's called smAsho mAsh and I've been following it for a few years. The owner is a Solar Scientist and despite many claiming that the current Solar Cycle 25 is the minimum, he's been observing that the last cycle Solar Cycle 24 was actually the minimum and the data bears this out as we are already seeing far more sunspots than should be seen if this was a minimum.\n\nWhat this means is that the 20 year pause in temperatures was actually down to the advent of the Solar Minimum between December 2008 and the end in December 2019. Decreased solar activity also meant that there was less solar wind and Coronal Mass Ejections and this allowed the Cosmic Rays to increase the amount of cloud cover which cooled the planet through Cloud Condensation Nuclei allowing more clouds to form.\n\nIt's really very easy to see that Carbon Dioxide is not the only thing that changed since the Little Ice Age. Despite claims to the contrary, the increase of solar activity since the minimums over the last 600 years accounts for all the Global Warming that we've seen. Carbon Dioxide plays a minuscule part in the Climate when compared to water vapor and solar activity. \n\nJust recently there was a video on channel American Thought Leaders called ""Nobel Laureate John Clauser: Climate Models Miss Key Variable"" and he backs up that the missing piece of the Climate Models is CLOUDS.\n\nWho would have thought that William Wordsworth was a Climate Scientist?!",there a channel which feature space weather on youtube it call smasho mash and ive be follow it for a few year the owner be a solar scientist and despite many claim that the current solar cycle 25 be the minimum he be observe that the last cycle solar cycle 24 be actually the minimum and the data bear this out a we be already see far more sunspot than should be see if this be a minimum what this mean be that the 20 year pause in temperature be actually down to the advent of the solar minimum between december 2008 and the end in december 2019 decrease solar activity also mean that there be less solar wind and coronal mass ejection and this allow the cosmic ray to increase the amount of cloud cover which cool the planet through cloud condensation nucleus allow more cloud to form it really very easy to see that carbon dioxide be not the only thing that change since the little ice age despite claim to the contrary the increase of solar activity since the minimum over the last 600 year account for all the global warming that weve see carbon dioxide play a minuscule part in the climate when compare to water vapor and solar activity just recently there be a video on channel american thought leader call nobel laureate john clauser climate model miss key variable and he back up that the miss piece of the climate model be cloud who would have think that william wordsworth be a climate scientist
