# Exercise: Computational Linguistics over Reddit Data

For this project we are going to ingest Reddit posts, process the data and perform computational linguistics over the Reddit posts.

This project will build off of some work you have previously done. However, beyond that exercise of processing and cataloging the feeds, in this instance you will access the referenced Reddit post and perform computational linguistics over the post itself.

![DataScraper_To_NLP.png MISSING](../images/DataScraper_To_NLP.png)

---

### From the site:

reddit: https://www.reddit.com/  
Reddit gives you the best of the Internet in one place. Get a constantly updating feed of breaking news, fun stories, pics, memes, and videos just for you.


### From Wikipedia:
Reddit is an American social news aggregation, web content rating, and discussion website. 
Registered members submit content to the site such as links, text posts, and images, 
which are then voted up or down by other members. 
Posts are organized by subject into user-created boards called "subreddits", 
which cover a variety of topics including news, science, movies, video games, music, books, fitness, food, and image-sharing. 
Submissions with more up-votes appear towards the top of their subreddit and, if they receive enough votes, ultimately on the site's front page. 



#### Sample Posting:

The below link is an example post from someone that was tinkering with sentiment analysis; specifically they looked at the text of [Moby Dick](https://en.wikipedia.org/wiki/Moby-Dick).

**Spoiler:** The conclusion was that the book is rather negative in sentiment.
It is after all, about vengeance!

https://www.reddit.com/r/LanguageTechnology/comments/9whk23/a_simple_nlp_pipeline_to_calculate_running/



### From: https://www.redditinc.com/
![REDDIT_About.png MISSING](../images/REDDIT_About_latest.png)

---

## Data Acquisition


### Example Code:

In this exercise, we will be using Reddit API for fetching the latest messages. We can also fetch recent posts from Reddit using web feeds (check [here](./rss-feeds.ipynb)), but it seems our IP got banned for excessive requests to Reddit over the last few days. So we will be using Reddit API for which you are required to create your Reddit account and an app. 

Follow [this article](https://gilberttanner.com/blog/scraping-redditdata) to create your credentials. 

### Using Reddit API

For fetching Reddit data using API, we will be using a Python wrapper to Reddit API: [PRAW: The Python Reddit API Wrapper](https://github.com/praw-dev/praw)

Documentation: https://praw.readthedocs.io/en/latest/index.html

In [None]:
import praw

reddit = praw.Reddit(client_id='6Zlpzqm03D7YBnlC7eP5dQ', 
                     client_secret='nNTvXV_P1mrbda7buX1e4ufASxi_bA', 
                     user_agent='WebScraping')


In [None]:
# get 10 hot posts from the MachineLearning subreddit
hot_posts = reddit.subreddit('datascience').hot(limit=100)  # hot posts

# new_posts = reddit.subreddit('datascience').new(limit=10)  # new posts

# get hottest posts from all subreddits
# hot_posts = reddit.subreddit('all').hot(limit=10)


In [None]:
all_posts = list(hot_posts)  

# this line will initiate the fetching of posts as PRAW use a lazy approach (i.e, fetch when required)
# this part is done to avoid calling Reddit API multiple times while developing our code 

In [None]:

for post in all_posts:
    print(f"id : {post.id}")
    print(f"title : {post.title}")
    print(f"url : {post.url}")
    print(f"author : {str(post.author)} {type(str(post.author))}")
    print(f"score : {post.score} {type(post.score)} ")
    print(f"subreddit : {post.subreddit} {type(post.subreddit)} ")
    print(f"num_comments : {post.num_comments}")
    print(f"body : {post.selftext}")
    print(f"created : {post.created}")
    print(f"link_flair_text : {post.link_flair_text}")
    break  # break the loop after printing information about the first post

### Sub-Reddits

As described above, sub-reddits are communities organized around particular topics.

Some example sub-reddits:
 * https://www.reddit.com/r/datascience/
 * https://www.reddit.com/r/MachineLearning/
 * https://www.reddit.com/r/LanguageTechnology/
 * https://www.reddit.com/r/NLP/
 * https://www.reddit.com/r/Python/


# Exercise Tasks

## Part I: Data Acquisition and Loading 
1. Choose a subreddit of your choice. Preferably something of interest to you. 
1. Conceptualize a database design that can collect the data.
    * Make sure your items (posts) are unique and not duplicated!
    * Make sure you capture at least title, author, subreddit, tags, title link, and timestamp
    * Along with the metadata, capture all the text into one or more data field(s) suitable for information retrieval
    * Write triggers for auto updates of IR related fields
    * Add index (either GIN or GiST) for the IR related fields
    * Additionally, design a field to hold:
        * Sentiment
1. Implement the database in your PostgreSQL schema
1. Implement cells of Python Code that 
    * collect the latest posts from a subreddit of your choice (**should be text-dominant not image/video**) and collect at least 500 posts (if possible), 
    * processes the messages to extract metadata, 
    * process the text for IR, and 
    * perform computational linguistics (i.e, extract sentiment scores), 
    * then insert the data into your database.
1. After you have loaded data from a subreddit, choose a few more subreddits and load those!

## Part II: Analytics 

1. Write some test queries following the text vectors from Module 7.
1. Produce **interesting visualizations** of the linguistic data.
    * Try to look for trends (within a subreddit) and and variations of topics across subreddits
    * Some comparative plots across feeds
1. Write a summary of your findings!

 
 

# Part I: Data Acquisition and Loading

## Task 1: Design your database

Conceptualize a database design that can collect the data.
* Make sure your items (posts) are unique and not duplicated!
* Make sure you capture at least title, link, author, subreddit, tag/flair, and timestamp
* Capture all the body text into fields suitable for information retrieval
* Write triggers for auto updates of IR related fields
* Add index (either GIN or GiST) for the IR related fields
* Additionally, design a field to hold:
    - Sentiment



---

## Task 2: Implement the database in your PostgreSQL schema

You can choose any of the three ways to implement your database. 

* sql magic 
* sql terminal 
* psycopg2 or sqlalchemy


In [1]:
import getpass

# Initialize some variables
mysso="ssdn4"   
schema='ssdn4' 
hostname='pgsql.dsa.lan'
database='dsa_student'

mypasswd = getpass.getpass("Type Password and hit enter")
connection_string = f"postgres://{mysso}:{mypasswd}@{hostname}/{database}"

%load_ext sql
%sql $connection_string 
del mypasswd

Type Password and hit enter········


In [2]:
%%sql

DROP TABLE IF EXISTS reddits;

CREATE TABLE reddits(
    id varchar(250) NOT NULL,
    title varchar(500) NOT NULL,
    author varchar(250) NOT NULL,
    link varchar(250) NOT NULL,
    subreddit varchar(250) NOT NULL,
    tag varchar(250) NOT NULL,
    timestamp varchar(250) NOT NULL,
    content text NOT NULL,
    sentiment varchar(250) NOT NULL
);

ALTER TABLE reddits
ADD CONSTRAINT pk_reddits PRIMARY KEY (id);

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
Done.
Done.
Done.


[]

In [3]:
%%sql

DROP TRIGGER IF EXISTS tsv_gist_update on reddits;

CREATE TRIGGER tsv_gist_update 
    BEFORE INSERT OR UPDATE
    ON reddits
    FOR EACH ROW 
    EXECUTE PROCEDURE 
    tsvector_update_trigger(content_tsv_gist,'pg_catalog.english',content);

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
Done.
Done.


[]

In [4]:
%%sql
ALTER TABLE reddits
    ADD COLUMN content_tsv_gist tsvector;

UPDATE reddits
SET content_tsv_gist = to_tsvector('pg_catalog.english', content);

CREATE INDEX reddits_content_tsv_gist
ON reddits USING GIST(content_tsv_gist);

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
Done.
0 rows affected.
Done.


[]

In [5]:
%%sql

SELECT * FROM reddits
LIMIT 5

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
0 rows affected.


id,title,author,link,subreddit,tag,timestamp,content,sentiment,content_tsv_gist


## Task 3: Implement cells of Python Code that

* collect the latest posts from a subreddit of your choice (should be text-dominant not image/video) and collect at least 500 posts (if possible),
* processes the messages to extract id, title, link, author, subreddit, tag/flair, timestamp, etc. 
* process the text for IR, and
* perform computational linguistics (e.g., get sentiment scores)
* then insert the data into your database.


Notes: 
* Each call to Reddit API returns 100 entries max. If we set a limit of more than 100, PRAW will handle multiple API calls internally and lazily fetches data. Check obfuscation and API limitation in https://praw.readthedocs.io/en/v3.6.2/pages/getting_started.html. 
* Develop and test your code with less than 100 messages from a subreddit. Then increase the limit and add few more subreddits. 
* While loading the table, test with one row 


In [6]:
## Your code in this cell
## ------------------------

import praw

reddit = praw.Reddit(client_id='6Zlpzqm03D7YBnlC7eP5dQ', 
                     client_secret='nNTvXV_P1mrbda7buX1e4ufASxi_bA', 
                     user_agent='WebScraping')

In [7]:
hot_posts = reddit.subreddit('datascience').hot(limit=300)

In [8]:
all_posts = list(hot_posts)  

In [9]:
import pandas as pd
posts = []

for post in all_posts:
    posts.append([post.id, post.title, post.author, post.url, post.subreddit, post.link_flair_text, 
                  post.created, post.selftext])

## Task 6: After you have loaded data from a subreddit, choose a few more subreddit and load those!

Add cells if required

In [10]:
## Your code in this cell
## ------------------------

more_posts = reddit.subreddit('machinelearning').hot(limit=200)
more_posts = list(more_posts)

for post in more_posts:
    posts.append([post.id, post.title, post.author, post.url, post.subreddit, post.link_flair_text, 
                  post.created, post.selftext])

In [11]:
posts_df = pd.DataFrame(posts, columns=['id', 'title', 'author', 'url', 'subreddit', 'tag', 'timestamp', 'content']).dropna()
posts_df.head()

Unnamed: 0,id,title,author,url,subreddit,tag,timestamp,content
0,q56pjd,Weekly Entering & Transitioning Thread | 10 Oc...,datascience-bot,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1633867000.0,Welcome to this week's entering & transitionin...
1,q8phlx,Is there a protocol for working with people wh...,rotterdamn8,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634308000.0,Hi all. I work in a very big company everyone ...
2,q8na06,Does anyone have experience with live dashboards?,TheMapesHotel,https://www.reddit.com/r/datascience/comments/...,datascience,Projects,1634300000.0,So I essentially need to build something for m...
3,q8w31u,Why it's so hard to collaborate with other DS?,stiff_neck_remedy,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634328000.0,"The title sounds bad, I know... I'm not talki..."
4,q8zq7c,Good Free Online Sources to Self-Study Excel/G...,The_Zhuster,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634339000.0,As title states. I recently graduated with a B...


In [12]:
import re
import nltk
from nltk.corpus import stopwords
nltk.download("stopwords")
stop_words = stopwords.words("english")

[nltk_data] Downloading package stopwords to /home/ssdn4/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [13]:
for val in posts_df['content']:
    if val not in stop_words:
        newval = re.sub(r'[\W]+', '', val.lower())
        posts_df['content'].replace(val,newval)

In [14]:
posts_df.head()

Unnamed: 0,id,title,author,url,subreddit,tag,timestamp,content
0,q56pjd,Weekly Entering & Transitioning Thread | 10 Oc...,datascience-bot,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1633867000.0,Welcome to this week's entering & transitionin...
1,q8phlx,Is there a protocol for working with people wh...,rotterdamn8,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634308000.0,Hi all. I work in a very big company everyone ...
2,q8na06,Does anyone have experience with live dashboards?,TheMapesHotel,https://www.reddit.com/r/datascience/comments/...,datascience,Projects,1634300000.0,So I essentially need to build something for m...
3,q8w31u,Why it's so hard to collaborate with other DS?,stiff_neck_remedy,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634328000.0,"The title sounds bad, I know... I'm not talki..."
4,q8zq7c,Good Free Online Sources to Self-Study Excel/G...,The_Zhuster,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634339000.0,As title states. I recently graduated with a B...


In [15]:
#sentiment analyzer 
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
post_sentiment = [analyzer.polarity_scores(t) for t in posts_df.content]

In [16]:
sentdf = pd.DataFrame(post_sentiment)
sentdf['content'] = posts_df.content
sentdf['id'] = posts_df.id
sentdf = sentdf[['id','content', 'neg', 'neu', 'pos', 'compound']]
sentdf.head()

Unnamed: 0,id,content,neg,neu,pos,compound
0,q56pjd,Welcome to this week's entering & transitionin...,0.0,0.964,0.036,0.5093
1,q8phlx,Hi all. I work in a very big company everyone ...,0.047,0.838,0.115,0.9629
2,q8na06,So I essentially need to build something for m...,0.0,0.957,0.043,0.3182
3,q8w31u,"The title sounds bad, I know... I'm not talki...",0.096,0.799,0.105,0.7557
4,q8zq7c,As title states. I recently graduated with a B...,0.0,0.896,0.104,0.9555


In [17]:
sentdf['sentiment'] = 'NEU'
sentdf.loc[sentdf['compound'] > 0.5, 'sentiment'] = 'POS'
sentdf.loc[sentdf['compound'] < -0.5, 'sentiment'] = 'NEG'
sentdf.head()

Unnamed: 0,id,content,neg,neu,pos,compound,sentiment
0,q56pjd,Welcome to this week's entering & transitionin...,0.0,0.964,0.036,0.5093,POS
1,q8phlx,Hi all. I work in a very big company everyone ...,0.047,0.838,0.115,0.9629,POS
2,q8na06,So I essentially need to build something for m...,0.0,0.957,0.043,0.3182,NEU
3,q8w31u,"The title sounds bad, I know... I'm not talki...",0.096,0.799,0.105,0.7557,POS
4,q8zq7c,As title states. I recently graduated with a B...,0.0,0.896,0.104,0.9555,POS


In [18]:
df = posts_df.merge(sentdf, on='id', how='left').fillna("")
df = df.drop(axis=1, columns=['neg', 'neu', 'pos', 'compound','content_y'])
df.head()

Unnamed: 0,id,title,author,url,subreddit,tag,timestamp,content_x,sentiment
0,q56pjd,Weekly Entering & Transitioning Thread | 10 Oc...,datascience-bot,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1633867000.0,Welcome to this week's entering & transitionin...,POS
1,q8phlx,Is there a protocol for working with people wh...,rotterdamn8,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634308000.0,Hi all. I work in a very big company everyone ...,POS
2,q8na06,Does anyone have experience with live dashboards?,TheMapesHotel,https://www.reddit.com/r/datascience/comments/...,datascience,Projects,1634300000.0,So I essentially need to build something for m...,NEU
3,q8w31u,Why it's so hard to collaborate with other DS?,stiff_neck_remedy,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634328000.0,"The title sounds bad, I know... I'm not talki...",POS
4,q8zq7c,Good Free Online Sources to Self-Study Excel/G...,The_Zhuster,https://www.reddit.com/r/datascience/comments/...,datascience,Discussion,1634339000.0,As title states. I recently graduated with a B...,POS


In [19]:
df = df.rename(columns={'content_x':'content','url':'link'})
#df['content_tsv_gin'] = ''
df.tail()

Unnamed: 0,id,title,author,link,subreddit,tag,timestamp,content,sentiment
488,q2ad6c,"[D] DNN options for multivariate, ragged tenso...",e1gord0,https://www.reddit.com/r/MachineLearning/comme...,MachineLearning,Discussion,1633484000.0,"Hypothetically, if you had a tabular dataset s...",
489,q20g5i,"[R] Google AI 0pen Sources ‘FedJAX’, A JAX-bas...",techsucker,https://www.reddit.com/r/MachineLearning/comme...,MachineLearning,Research,1633452000.0,Federated learning is a machine learning envir...,
490,q293wz,[D]Looking for Reviews and analysis of ML vend...,icurate,https://www.reddit.com/r/MachineLearning/comme...,MachineLearning,Discussion,1633479000.0,We have the opportunity to customize a DAM and...,
491,q1yrbx,[D] Multilingual Parallel dataset for Machine ...,amruh,https://www.reddit.com/r/MachineLearning/comme...,MachineLearning,Discussion,1633448000.0,"Hi, I've been searching a multilingual paralle...",
492,q27h6w,[D] Implement speaker identification module,hpk_platinium,https://www.reddit.com/r/MachineLearning/comme...,MachineLearning,Discussion,1633474000.0,"Hi everyone,\n\nI am seeking to create a speak...",


In [20]:
df=df.astype(str)

In [21]:
#load the data
from sqlalchemy import create_engine
df.to_sql('reddits', con=create_engine(connection_string), schema='ssdn4', index= False, if_exists='append')

In [22]:
%%sql

select * 
from reddits
limit 5

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
5 rows affected.


id,title,author,link,subreddit,tag,timestamp,content,sentiment,content_tsv_gist
q56pjd,Weekly Entering & Transitioning Thread | 10 Oct 2021 - 17 Oct 2021,datascience-bot,https://www.reddit.com/r/datascience/comments/q56pjd/weekly_entering_transitioning_thread_10_oct_2021/,datascience,Discussion,1633867230.0,"Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: * Learning resources (e.g. books, tutorials, videos) * Traditional education (e.g. schools, degrees, electives) * Alternative education (e.g. online courses, bootcamps) * Job search questions (e.g. resumes, applying, career prospects) * Elementary questions (e.g. where to start, what next) While you wait for answers from the community, check out the [FAQ](https://www.reddit.com/r/datascience/wiki/frequently-asked-questions) and [Resources](Resources) pages on our wiki. You can also search for answers in [past weekly threads](https://www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).",POS,"'/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).':96 '/r/datascience/wiki/frequently-asked-questions)':76 'also':86 'altern':40 'answer':66,89 'appli':51 'book':31 'bootcamp':45 'career':52 'check':70 'communiti':69 'cours':44 'data':23 'degre':38 'e.g':30,36,42,49,56 'educ':35,41 'elect':39 'elementari':54 'enter':6 'faq':73 'field':25 'get':16 'includ':27 'job':46 'learn':28 'next':61 'onlin':43 'page':80 'past':91 'prospect':53 'question':14,48,55 'resourc':29,78,79 'resum':50 'school':37 'scienc':24 'search':47,87 'start':17,59 'studi':18 'thread':8,10,93 'topic':26 'tradit':34 'transit':7,20 'tutori':32 'video':33 'wait':64 'week':4,92 'welcom':1 'wiki':83 'www.reddit.com':75,95 'www.reddit.com/r/datascience/search?q=weekly%20thread&restrict_sr=1&sort=new).':94 'www.reddit.com/r/datascience/wiki/frequently-asked-questions)':74"
q8phlx,Is there a protocol for working with people who make really bad code?,rotterdamn8,https://www.reddit.com/r/datascience/comments/q8phlx/is_there_a_protocol_for_working_with_people_who/,datascience,Discussion,1634307730.0,"Hi all. I work in a very big company everyone knows, and just started on a new project. I was brought in to work on a new phase of this project so we're not starting from scratch. The existing team has brought me up to speed. What they've implemented is a train wreck (it works but not very elegant). I'm a solidly intermediate programmer and data guy. I don't stand so tall that I'm gonna judge anyone, but I definitely take care to write clean, commented code that others can read and debug if needed. I use functions appropriately. I've been doing Python for some years and started doing legit OOP this year. I got the hang of it. I am now inheriting someone's messy Python. Duplicate ""import \[some library\]"" statements, almost no functions, zero objects (which I realize is not always needed), passwords saved in scripts, only a few comments here and there. They've been saving SQL scripts in Teams. What? No one thought to create a repository in the company's private Github?? I'm sure some of you have been on this side of it (while some of you have been on the other side). How did you handle it? Note 1: I could have asked this in r/programming, but I think this is probably more prevalent in data. A lot of hacks! :) Note 2: this is a genuine question, not a rant. Just want to hear others' experience.",POS,"'1':213 '2':236 'almost':139 'alway':149 'anyon':82 'appropri':104 'ask':217 'big':8 'brought':21,43 'care':87 'clean':90 'code':92 'comment':91,158 'compani':9,180 'could':215 'creat':175 'data':69,230 'debug':98 'definit':85 'duplic':134 'eleg':61 'everyon':10 'exist':40 'experi':250 'function':103,141 'genuin':240 'github':183 'gonna':80 'got':121 'guy':70 'hack':234 'handl':210 'hang':123 'hear':248 'hi':1 'implement':51 'import':135 'inherit':129 'intermedi':66 'judg':81 'know':11 'legit':116 'librari':137 'lot':232 'm':63,79,185 'messi':132 'need':100,150 'new':17,27 'note':212,235 'object':143 'one':172 'oop':117 'other':94,249 'password':151 'phase':28 'preval':228 'privat':182 'probabl':226 'programm':67 'project':18,31 'python':109,133 'question':241 'r/programming':220 'rant':244 're':34 'read':96 'realiz':146 'repositori':177 'save':152,165 'scratch':38 'script':154,167 'side':194,206 'solid':65 'someon':130 'speed':47 'sql':166 'stand':74 'start':14,36,114 'statement':138 'sure':186 'take':86 'tall':76 'team':41,169 'think':223 'thought':173 'train':54 'use':102 've':50,106,163 'want':246 'work':4,24,57 'wreck':55 'write':89 'year':112,119 'zero':142"
q8na06,Does anyone have experience with live dashboards?,TheMapesHotel,https://www.reddit.com/r/datascience/comments/q8na06/does_anyone_have_experience_with_live_dashboards/,datascience,Projects,1634300459.0,"So I essentially need to build something for my company that can port data from a qualtrics on a regular basis for around a hundred people and create dashboards for the users. I'd like both me and the user around my state to be able to access the dashboard, thus the live connection piece. I know I can export the data from qualtrics but then it lives with me on my computer and I'd have to send it to the user. Thoughts?",NEU,"'abl':46 'access':48 'around':23,41 'basi':21 'build':6 'compani':10 'comput':73 'connect':54 'creat':28 'd':34,76 'dashboard':29,50 'data':14,62 'essenti':3 'export':60 'hundr':25 'know':57 'like':35 'live':53,68 'need':4 'peopl':26 'piec':55 'port':13 'qualtric':17,64 'regular':20 'send':79 'someth':7 'state':43 'thought':84 'thus':51 'user':32,40,83"
q8w31u,Why it's so hard to collaborate with other DS?,stiff_neck_remedy,https://www.reddit.com/r/datascience/comments/q8w31u/why_its_so_hard_to_collaborate_with_other_ds/,datascience,Discussion,1634327593.0,"The title sounds bad, I know... I'm not talking about collaboration in general team work or working environment, but a project-level collaboration: when multiple data scientists work together for one project. I found that it's so much efficient and well... peaceful when I take the initiative of the entire processes, from model scoping, coding, production to documentation. When collaborating with other data scientists for building one model, things can get overly complicated and political. Even splitting the tasks and deciding who does what are not easy. DS is still a new way of doing things, and there are no strict rules or industry standard processes. If there are few, they might not applicable for a company-specific business problems. Also, the field attracts relatively more competitive people than other jobs do, which can lead to team members to compete against each other to get more recognition, glory, interesting future projects and what not. i just want to get shits done without all the dramas and politics. I rarely had this kind of the issue when I was working with other teams as a sole data scientist. Is this normal to feel this way? If you had a similar issue, how did you handle the situation? TLDR I'm having a hard time working together with other data scientists with shared responsibility. Is this hard for you too, or I'm being the DIFFICULT one in the team?",POS,"'also':125 'applic':117 'attract':128 'bad':4 'build':69 'busi':123 'code':58 'collabor':12,25,63 'compani':121 'company-specif':120 'compet':144 'competit':131 'complic':76 'data':28,66,190,222 'decid':84 'difficult':238 'document':61 'done':165 'drama':169 'ds':91 'easi':90 'effici':42 'entir':53 'environ':19 'even':79 'feel':196 'field':127 'found':36 'futur':154 'general':14 'get':74,149,163 'glori':152 'handl':208 'hard':216,229 'industri':107 'initi':50 'interest':153 'issu':179,204 'job':135 'kind':176 'know':6 'lead':139 'level':24 'm':8,213,235 'member':142 'might':115 'model':56,71 'much':41 'multipl':27 'new':95 'normal':194 'one':33,70,239 'over':75 'peac':45 'peopl':132 'polit':78,171 'problem':124 'process':54,109 'product':59 'project':23,34,155 'project-level':22 'rare':173 'recognit':151 'relat':129 'respons':226 'rule':105 'scientist':29,67,191,223 'scope':57 'share':225 'shit':164 'similar':203 'situat':210 'sole':189 'sound':3 'specif':122 'split':80 'standard':108 'still':93 'strict':104 'take':48 'talk':10 'task':82 'team':15,141,186,242 'thing':72,99 'time':217 'titl':2 'tldr':211 'togeth':31,219 'want':161 'way':96,198 'well':44 'without':166 'work':16,18,30,183,218"
q8zq7c,Good Free Online Sources to Self-Study Excel/Google Sheets + Tableau for Data Analytics/Data Science,The_Zhuster,https://www.reddit.com/r/datascience/comments/q8zq7c/good_free_online_sources_to_selfstudy_excelgoogle/,datascience,Discussion,1634339204.0,"As title states. I recently graduated with a B.S. Computer Science, Summa Cum Laude, but I wanted to switch gears from software engineering to data analytics or data science. So there are certain fields that I want to self-study that I have heard are commonly sought for in data analytics/data science that I did not get to learn in the handful of data science courses I took for electives in my final year of college. I am already in the process of studying SQL and R as I finish up free sources that I will mention in later paragraph, but I was wondering if I could get recommendations for free online sources for learning either of the following: Tableau, Excel/Google Sheets? In this paragraph, I'll mention the sources I used to study SQL and R and I was wondering if any of these at first glance seemed effective or not, otherwise, I am open to other sources for these languages to stay sharp in my mastery of such: * SQL: [https://www.sqlcourse2.com](https://www.sqlcourse2.com), [https://www.w3schools.com/sql/](https://www.w3schools.com/sql/) * R: [https://www.codecademy.com/learn/learn-r](https://www.codecademy.com/learn/learn-r)",POS,"'/learn/learn-r](https://www.codecademy.com/learn/learn-r)':181 '/sql/](https://www.w3schools.com/sql/)':177 'alreadi':80 'analyt':26 'analytics/data':52 'b.s':9 'certain':33 'colleg':77 'common':47 'comput':10 'could':108 'cours':67 'cum':13 'data':25,28,51,65 'effect':151 'either':117 'elect':71 'engin':23 'excel/google':122 'field':34 'final':74 'finish':91 'first':148 'follow':120 'free':93,112 'gear':20 'get':58,109 'glanc':149 'graduat':6 'hand':63 'heard':45 'languag':163 'later':100 'laud':14 'learn':60,116 'll':128 'masteri':169 'mention':98,129 'onlin':113 'open':157 'otherwis':154 'paragraph':101,126 'process':83 'r':88,138,178 'recent':5 'recommend':110 'scienc':11,29,53,66 'seem':150 'self':40 'self-studi':39 'sharp':166 'sheet':123 'softwar':22 'sought':48 'sourc':94,114,131,160 'sql':86,136,172 'state':3 'stay':165 'studi':41,85,135 'summa':12 'switch':19 'tableau':121 'titl':2 'took':69 'use':133 'want':17,37 'wonder':105,142 'www.codecademy.com':180 'www.codecademy.com/learn/learn-r](https://www.codecademy.com/learn/learn-r)':179 'www.sqlcourse2.com':173,174 'www.w3schools.com':176 'www.w3schools.com/sql/](https://www.w3schools.com/sql/)':175 'year':75"



### In part II, we will search your database as `dsa_ro_user user`. To prepare your DB to be read, you will need to grant the dsa_ro_user schema access and select privileges on your table.

```SQL
GRANT USAGE ON SCHEMA <your schema> TO dsa_ro_user;  -- NOTE: change to your schema
GRANT SELECT ON <your table> TO dsa_ro_user;
```

In [23]:
%%sql

GRANT USAGE ON SCHEMA ssdn4 TO dsa_ro_user;
GRANT SELECT ON reddits TO dsa_ro_user;

 * postgres://ssdn4:***@pgsql.dsa.lan/dsa_student
Done.
Done.


[]

# Save your notebook, then `File > Close and Halt`

---