# No Man's Sky.
Natural language processing.

### Foreword
Given the massive number of reviews for the game, is there a way to generate unique summaries in relative to certain conditions? Of course, and that's where the concept of <b>natural language processing</b> comes in.

The reviews are going to be divided into <b>years</b> and rather they are <b>positive</b> or <b>negative</b>, and there will be a dataframe built upon the many different summaries generated.

In [1]:
import pandas as pd
from datetime import datetime

import math

import nltk
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

In [2]:
df = pd.read_json("./mns-data/nms_data.json")

In [3]:
maindata = df[:121916]
metadata = df[121916:]

In [4]:
id_data = [maindata['reviews'][i]['recommendationid'] for i in range(len(maindata))]
mnsdata = pd.DataFrame(id_data, columns = ['id'])
mnsdata

Unnamed: 0,id
0,92447055
1,92446235
2,92445739
3,92445679
4,92445399
...,...
121911,24845464
121912,24845460
121913,24845437
121914,24845428


In [5]:
reviews_data = [maindata['reviews'][i]['review'] for i in range(len(maindata))]
review_votes = [maindata['reviews'][i]['voted_up'] for i in range(len(maindata))]

mnsdata['Reviews'] = reviews_data
mnsdata['Recommended?'] = review_votes

mnsdata

Unnamed: 0,id,Reviews,Recommended?
0,92447055,this is a game on the steam store,False
1,92446235,Space is great fun yes is great fun\n,True
2,92445739,Played half of my time on launch day in 2016 a...,True
3,92445679,Pretty impressive that they went out and made ...,True
4,92445399,"Very fun open world game, however the multipla...",True
...,...,...,...
121911,24845464,"Quite nice, actually.",True
121912,24845460,Rest in peace Harambe. Your name will always b...,True
121913,24845437,"1ST PUBLIC REVIEW!\n\nEDIT: Quick tips, left c...",True
121914,24845428,Its pretty good tbh\nEDIT: Its pretty good but...,False


In [6]:
timestamp = [maindata['reviews'][i]['timestamp_updated'] for i in range(len(maindata))]
date = [datetime.fromtimestamp(timestamp[i]).strftime('%Y-%m-%d') for i in range(len(timestamp))]

mnsdata['Date'] = date
mnsdata['Yearly'] = mnsdata['Date'].str[0:4]
mnsdata['Monthly'] = mnsdata['Date'].str[0:4]+mnsdata['Date'].str[5:7]

mnsdata

Unnamed: 0,id,Reviews,Recommended?,Date,Yearly,Monthly
0,92447055,this is a game on the steam store,False,2021-05-22,2021,202105
1,92446235,Space is great fun yes is great fun\n,True,2021-05-22,2021,202105
2,92445739,Played half of my time on launch day in 2016 a...,True,2021-05-22,2021,202105
3,92445679,Pretty impressive that they went out and made ...,True,2021-05-22,2021,202105
4,92445399,"Very fun open world game, however the multipla...",True,2021-05-22,2021,202105
...,...,...,...,...,...,...
121911,24845464,"Quite nice, actually.",True,2016-08-13,2016,201608
121912,24845460,Rest in peace Harambe. Your name will always b...,True,2018-07-25,2018,201807
121913,24845437,"1ST PUBLIC REVIEW!\n\nEDIT: Quick tips, left c...",True,2016-08-12,2016,201608
121914,24845428,Its pretty good tbh\nEDIT: Its pretty good but...,False,2016-08-17,2016,201608


This is very much built like the analytics side of things but with much fewer columns. We only need <b>Reviews</b>, <b>Recommended?</b>, <b>Date</b>, <b>Yearly</b>, and <b>Monthly</b>. Everything else is background noise.

### Natural language processing

In [7]:
def sentence_similarity(sent1, sent2, stopwords=None):
    if stopwords is None:
        stopwords = []
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1+sent2))
    
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
    
    for w in sent1:
        if w in stopwords:
            continue
        vector1[all_words.index(w)] += 1
        
    for w in sent2:
        if w in stopwords:
            continue
        vector2[all_words.index(w)] += 1
        
    return 1-cosine_distance(vector1, vector2)

def gen_sim_matrix(sentences, stop_words):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
            if idx1 == idx2:
                continue
            similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2], stop_words)
            
    return similarity_matrix

def generate_summary(reviews, top_n=5):
    stop_words = stopwords.words('english')
    summarize_text = []
    sentence_similarity_matrix = gen_sim_matrix(reviews, stop_words)
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_matrix)
    scores = nx.pagerank(sentence_similarity_graph)
    ranked_sentence = sorted(((scores[i], s) for i, s in enumerate(reviews)), reverse=True)

    for i in range(top_n):
        summarize_text.append("".join(ranked_sentence[i][1]))
        
    return summarize_text

Code provided from here: https://www.youtube.com/watch?v=dFe7tbH39Eg

### Summerisation

#### NLP 2016 positives

In [8]:
mns_2016_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2016_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [9]:
nlp_initial_2016_positive = [mns_2016_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2016_positive)/50))]
nlp_aggregate_2016_positive = []

for i in range(len(nlp_initial_2016_positive)):
    try:
        nlp_aggregate_2016_positive += generate_summary(nlp_initial_2016_positive[i])
    except:
        continue

  return 1 - (numpy.dot(u, v) / (sqrt(numpy.dot(u, u)) * sqrt(numpy.dot(v, v))))


In [10]:
nlp_aggregate_2016_positive.pop(19)
nlp_aggregate_2016_positive

['New update is looking great.',
 'New update is pretty good',
 'the game is pritty good now',
 "It's a decent game.",
 'It might get better.',
 'New update is great.',
 'i just wanted to be different',
 'Best Money Disposal Simulator',
 'I just want to be different.',
 'It is a nice game.',
 'I strangely like this game.',
 'best game ive played in years.',
 'I really enjoy this game',
 'I LOVE THIS GAME',
 'pretty good i guess',
 'well its fine i guess',
 'I liked the hype I guess.',
 'get it son',
 "It's all about the adventure.",
 'Very good and beatiful game',
 "I like it, even can't stop.",
 'The most amazing game ever !!',
 "I've really enjoyed this game.",
 'Love it, Dont care.',
 'i love this game',
 'EPIC game for us explorers ;)',
 'I represent the sub reddit ;)',
 'Fun exploration game',
 'The Perfect Stoner Game',
 "I's a great game I love it.",
 'I like being lied to!',
 'Cool game - great time killer',
 'so far its a great game',
 'I really like this game',
 'Loving the g

In [11]:
nlp_aggregate_2016_positive.pop(35)
nlp_aggregate_2016_positive

['New update is looking great.',
 'New update is pretty good',
 'the game is pritty good now',
 "It's a decent game.",
 'It might get better.',
 'New update is great.',
 'i just wanted to be different',
 'Best Money Disposal Simulator',
 'I just want to be different.',
 'It is a nice game.',
 'I strangely like this game.',
 'best game ive played in years.',
 'I really enjoy this game',
 'I LOVE THIS GAME',
 'pretty good i guess',
 'well its fine i guess',
 'I liked the hype I guess.',
 'get it son',
 "It's all about the adventure.",
 'Very good and beatiful game',
 "I like it, even can't stop.",
 'The most amazing game ever !!',
 "I've really enjoyed this game.",
 'Love it, Dont care.',
 'i love this game',
 'EPIC game for us explorers ;)',
 'I represent the sub reddit ;)',
 'Fun exploration game',
 'The Perfect Stoner Game',
 "I's a great game I love it.",
 'I like being lied to!',
 'Cool game - great time killer',
 'so far its a great game',
 'I really like this game',
 'Loving the g

In [12]:
nlp_aggregate_2016_positive.pop(51)
nlp_aggregate_2016_positive

['New update is looking great.',
 'New update is pretty good',
 'the game is pritty good now',
 "It's a decent game.",
 'It might get better.',
 'New update is great.',
 'i just wanted to be different',
 'Best Money Disposal Simulator',
 'I just want to be different.',
 'It is a nice game.',
 'I strangely like this game.',
 'best game ive played in years.',
 'I really enjoy this game',
 'I LOVE THIS GAME',
 'pretty good i guess',
 'well its fine i guess',
 'I liked the hype I guess.',
 'get it son',
 "It's all about the adventure.",
 'Very good and beatiful game',
 "I like it, even can't stop.",
 'The most amazing game ever !!',
 "I've really enjoyed this game.",
 'Love it, Dont care.',
 'i love this game',
 'EPIC game for us explorers ;)',
 'I represent the sub reddit ;)',
 'Fun exploration game',
 'The Perfect Stoner Game',
 "I's a great game I love it.",
 'I like being lied to!',
 'Cool game - great time killer',
 'so far its a great game',
 'I really like this game',
 'Loving the g

In [13]:
nlp_aggregate_2016_positive.pop(59)
nlp_aggregate_2016_positive

['New update is looking great.',
 'New update is pretty good',
 'the game is pritty good now',
 "It's a decent game.",
 'It might get better.',
 'New update is great.',
 'i just wanted to be different',
 'Best Money Disposal Simulator',
 'I just want to be different.',
 'It is a nice game.',
 'I strangely like this game.',
 'best game ive played in years.',
 'I really enjoy this game',
 'I LOVE THIS GAME',
 'pretty good i guess',
 'well its fine i guess',
 'I liked the hype I guess.',
 'get it son',
 "It's all about the adventure.",
 'Very good and beatiful game',
 "I like it, even can't stop.",
 'The most amazing game ever !!',
 "I've really enjoyed this game.",
 'Love it, Dont care.',
 'i love this game',
 'EPIC game for us explorers ;)',
 'I represent the sub reddit ;)',
 'Fun exploration game',
 'The Perfect Stoner Game',
 "I's a great game I love it.",
 'I like being lied to!',
 'Cool game - great time killer',
 'so far its a great game',
 'I really like this game',
 'Loving the g

In [14]:
nlp_2016_positive = generate_summary(nlp_aggregate_2016_positive)
nlp_2016_positive

['best game ive played in years.',
 'i love the game soo far',
 'good  but it need some repeirs',
 'Very good and beatiful game',
 'Very fun game lots to do.']

#### NLP 2016 negatives

In [15]:
mns_2016_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2016_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2016') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [16]:
nlp_initial_2016_negative = [mns_2016_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2016_negative)/50))]
nlp_aggregate_2016_negative = []

for i in range(len(nlp_initial_2016_negative)):
    try:
        nlp_aggregate_2016_negative += generate_summary(nlp_initial_2016_negative[i])
    except:
        continue

In [17]:
nlp_aggregate_2016_negative

['Just dont buy this trash game.',
 'A total waste of time.',
 '4 thumbs down for the dev',
 'it act like a penis',
 'Dont. Not even if its on sale.',
 'Better than playing with Pedro',
 'DO NOT BUY THIS GAME.',
 'Waste of time and money',
 'lets see this new update',
 'This game is really bad',
 'legit could not be worse',
 'Would be a great 5$ game',
 'this game sucks big time.',
 "Just don't. (Even in sales)",
 'No enough to do',
 'I tried to ignore the lies',
 'Worst game ever.  That is all.',
 'This game was a let down.',
 'this game sux donkey ballz.',
 'Writing this review at 98 days',
 'Not worth the time or money.',
 'pile of steaming horse shit',
 "It's tech demo, nothing more.",
 'Wtf even is this garbage.',
 'No Mans Sky = One Big Lie.. \n\n',
 "Seriously don't get it",
 'Giant lie, do not buy.',
 "Don't waste your time or cash.",
 "Just don't even try this game",
 "Waste of time. Don't buy.",
 'i forgot i owned this game',
 'DONT EVER BUY THIS GAME AT 70$',
 'isnt like how

In [18]:
nlp_2016_negative = generate_summary(nlp_aggregate_2016_negative)
nlp_2016_negative

['I DEMAND A REFUND N THIS GAME.',
 'WHY  WONT IT LET ME REFUND.',
 'get a refund while you can',
 'Would not recommend this game',
 'When do I get my refund?']

#### NLP 2017 positives

In [19]:
mns_2017_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2017_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [20]:
nlp_initial_2017_positive = [mns_2017_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2017_positive)/50))]
nlp_aggregate_2017_positive = []

for i in range(len(nlp_initial_2017_positive)):
    try:
        nlp_aggregate_2017_positive += generate_summary(nlp_initial_2017_positive[i])
    except:
        continue

In [21]:
nlp_aggregate_2017_positive

['A nice game to get lost in.',
 'me dude its great now',
 'Better than a year ago.',
 'nice game . come on',
 'i cant even use it',
 'Now the game looks great!',
 'The game is super fun now.',
 'Great game and nice view!',
 "This game confirms I'm a nerd.",
 'they kinda fixed it i guess',
 'its getting alot better.',
 'After update game is good',
 'A true exploration game.',
 'its like a proper game now',
 'The game is getting better.',
 'the cake is no longer a lie',
 'New Update is really good',
 "Way better than Old Man's Sky",
 'much better now !',
 'Would love to see this in VR',
 'The 1.3 update is really nice.',
 '1.3 is here, so get playing.',
 'I think its pretty decent.',
 'Just because I love this game.',
 'Updates made it great.',
 'hey man at least theyre trying',
 'dis game is very seet',
 "It's getting better !",
 'Keep it up Hello Games',
 'Hey, its pretty good!',
 'This game got a lot better.',
 'The game is now fixed.',
 'Gettin better all the time.',
 'Favorite game

In [22]:
nlp_2017_positive = generate_summary(nlp_aggregate_2017_positive)
nlp_2017_positive

['me dude its great now',
 'Better than a year ago.',
 'This game got a lot better.',
 'hey man at least theyre trying',
 'its getting better, thats it']

#### NLP 2017 negatives

In [23]:
mns_2017_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2017_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2017') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [24]:
nlp_initial_2017_negative = [mns_2017_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2017_negative)/50))]
nlp_aggregate_2017_negative = []

for i in range(len(nlp_initial_2017_negative)):
    try:
        nlp_aggregate_2017_negative += generate_summary(nlp_initial_2017_negative[i])
    except:
        continue

In [25]:
nlp_aggregate_2017_negative.pop(8)
nlp_aggregate_2017_negative

['I tried to like this game.',
 'Wish i could get a refund',
 'Shit. Wasted alot of money',
 'The lag is immersive.',
 'Best game of all time.',
 'Bigger let down than religion.',
 'A ploy for money.',
 'Please do not buy this!',
 "No Man's Scam, trash game",
 'how do i get a refund',
 "Why didn't I request a refund",
 'Not what has been promised',
 'Bad game retarded dont buy.',
 'Worst game that I ever played',
 'Lies and promised too much',
 'Too much hype for a trash game',
 'how do i refund this',
 'lyin sons of a bitches',
 'worst game of the year',
 'Do not buy. At least for now',
 'F this company and game.',
 "I got my refund so I'm happy",
 'Still stinks after new updates',
 'yep. its a big disapointment',
 'This game cured my insomnia.',
 'Kill me for buying htis game',
 'trash dont ever get this game',
 'DO NOT GET IT',
 'Not worth my money',
 'Get it on sale.',
 'Thankfully I got a refund.',
 'greatest scam in steam history',
 'Still a huge disappointment',
 'why the game c

In [26]:
nlp_aggregate_2017_negative.pop(24)
nlp_aggregate_2017_negative

['I tried to like this game.',
 'Wish i could get a refund',
 'Shit. Wasted alot of money',
 'The lag is immersive.',
 'Best game of all time.',
 'Bigger let down than religion.',
 'A ploy for money.',
 'Please do not buy this!',
 "No Man's Scam, trash game",
 'how do i get a refund',
 "Why didn't I request a refund",
 'Not what has been promised',
 'Bad game retarded dont buy.',
 'Worst game that I ever played',
 'Lies and promised too much',
 'Too much hype for a trash game',
 'how do i refund this',
 'lyin sons of a bitches',
 'worst game of the year',
 'Do not buy. At least for now',
 'F this company and game.',
 "I got my refund so I'm happy",
 'Still stinks after new updates',
 'yep. its a big disapointment',
 'Kill me for buying htis game',
 'trash dont ever get this game',
 'DO NOT GET IT',
 'Not worth my money',
 'Get it on sale.',
 'Thankfully I got a refund.',
 'greatest scam in steam history',
 'Still a huge disappointment',
 'why the game cant run',
 'GIVE US OUR MONEY BAC

In [27]:
nlp_2017_negative = generate_summary(nlp_aggregate_2017_negative)
nlp_2017_negative

['how do i get a refund',
 'Wish i could get a refund',
 'Lies and promised too much',
 'why the game cant run',
 'Shit. Wasted alot of money']

#### NLP 2018 positive

In [28]:
mns_2018_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2018_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [29]:
nlp_initial_2018_positive = [mns_2018_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2018_positive)/50))]
nlp_aggregate_2018_positive = []

for i in range(len(nlp_initial_2018_positive)):
    try:
        nlp_aggregate_2018_positive += generate_summary(nlp_initial_2018_positive[i])
    except:
        continue

In [30]:
nlp_aggregate_2018_positive

['Best Revival of any game imo.',
 'Its a fun Game ive enjoyed',
 'this game is fun and endless',
 'A great space exploration game',
 'There now is men in the sky.',
 'This game simply amazes me.',
 'It really has come a long way',
 'Better than at launch.',
 'from the ashes a phoenix rises',
 'This game is nuts',
 'Still the best game on Steam.',
 'best to play with friends',
 'Great comeback! love this game',
 "It's better than it was.",
 'its pretty alright I guess',
 'NICE keep going Hello games.',
 'The game that keeps on giving!',
 'getting better with updates',
 'Most comeback game of the year',
 'someof the views are stunning',
 'Complete time sink great fun',
 'Fun game.  Free updates, lots.',
 'better than before I guess',
 'Better than it was',
 'Best when played with friends.',
 'Great game many many updates.',
 'Is better than it was before.',
 'Got a lot better.',
 'it got better i swear\n',
 'it got better',
 'A very fun game.',
 'Amazing game, worth the price.',
 'Its a

In [31]:
nlp_2018_positive = generate_summary(nlp_aggregate_2018_positive)
nlp_2018_positive

['now ist a real game',
 'its better now i guess',
 'Its pretty good now yeah',
 'The game got a lot better.',
 'Is now the game advertised.']

#### NLP 2018 negatives

In [32]:
mns_2018_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2018_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2018') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [33]:
nlp_initial_2018_negative = [mns_2018_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2018_negative)/50))]
nlp_aggregate_2018_negative = []

for i in range(len(nlp_initial_2018_negative)):
    try:
        nlp_aggregate_2018_negative += generate_summary(nlp_initial_2018_negative[i])
    except:
        continue

In [34]:
nlp_aggregate_2018_negative

['Game runs like shit',
 'LOWER THE FUCKING PRICE',
 "Doesn't have any personality.",
 'no deap sea anymore',
 'A very boring game.',
 'Stil not a high quality game.',
 'I did not have fun.',
 'got a refund',
 "mindless autism. No Man's Buy",
 'Gets boring VERY quickly',
 'still runs like dog shit',
 'Boring and confusing as hell.',
 'runs like ass',
 'And this game is still shit.',
 'more like, no guy buy XD',
 'um  game is legit broken',
 'Enough has already been said.',
 'Crap game. Want a refund.',
 'worst game I ever bought.',
 'The NEXT Update is awful.',
 'Not at all what was expected.',
 'low fps in game with decent pc',
 'best game i ever play',
 'Not worth the time or money.',
 'It was a deception.']

In [35]:
nlp_2018_negative = generate_summary(nlp_aggregate_2018_negative)
nlp_2018_negative

['Game runs like shit',
 'um  game is legit broken',
 'Enough has already been said.',
 'Not worth the time or money.',
 'more like, no guy buy XD']

#### NLP 2019 positives

In [36]:
mns_2019_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2019_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [37]:
nlp_initial_2019_positive = [mns_2019_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2019_positive)/50))]
nlp_aggregate_2019_positive = []

for i in range(len(nlp_initial_2019_positive)):
    try:
        nlp_aggregate_2019_positive += generate_summary(nlp_initial_2019_positive[i])
    except:
        continue

  return 1 - (numpy.dot(u, v) / (sqrt(numpy.dot(u, u)) * sqrt(numpy.dot(v, v))))


In [38]:
nlp_aggregate_2019_positive

['good game would recommend',
 'ITS EN REALY GOED GAME',
 'A very fun game in space.',
 'glad they didnt give up',
 'a very fun game!',
 'Bad game turned v good game',
 'Turned into a great game',
 'its gotten alot better',
 'BEST MUSIC SOFTWARE EDITING!',
 'It is GREAT',
 'This game has aged wonderfully',
 'Got alot better with updates\n',
 'Is a nice game',
 'is big good, very recommend.',
 'My new favorite game.',
 'Needs more things to do',
 'latest updates are amazing.',
 'This is a really relaxing game',
 'i mean its pre good now',
 'It is nice to explore planets!',
 "It's gotten much better.",
 'this game is gangster',
 'Ye its pretty good',
 'eh. it got better',
 'Just a fucking great game!',
 'The best open world game',
 'the game is better now',
 'Eight. Take it, or leave it',
 'Has turned into a nice game',
 'Very fun game.',
 'Only gets better with time',
 'Meh to very GREAT GAME.',
 'Started shit. Now its great',
 'Best in VR yet',
 'Most improved game in history?',
 'pret

In [39]:
nlp_2019_positive = generate_summary(nlp_aggregate_2019_positive)
nlp_2019_positive

['The game is great now.',
 'ITS EN REALY GOED GAME',
 'Great Game  Worth the money.',
 'Needs more things to do',
 'Bad game gets good. Very nice']

#### NLP 2019 negatives

In [40]:
mns_2019_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2019_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2019') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [41]:
nlp_initial_2019_negative = [mns_2019_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2019_negative)/50))]
nlp_aggregate_2019_negative = []

for i in range(len(nlp_initial_2019_negative)):
    try:
        nlp_aggregate_2019_negative += generate_summary(nlp_initial_2019_negative[i])
    except:
        continue

In [42]:
nlp_aggregate_2019_negative.pop(12)
nlp_aggregate_2019_negative

['worst game on steam.',
 'shit game. dont buy it.',
 'ok but controls are terrible',
 'Waste of my hard earned cash.',
 'Multiplayer is too screwed up.',
 'it is SUPER boring holy hell',
 'runs like ass',
 'This game is extremely boring.',
 'Runs so poorly I returned it',
 'game is broken and sucks',
 'gets boring after like an hour',
 'This game is really boring.',
 'how do i get a refud',
 'Never loads , waste of money',
 'fuck this game its still trash',
 'Still a yikes from me chief.',
 "Can't rebind keys",
 'No mans promises.',
 "Just don't buy it"]

In [43]:
nlp_2019_negative = generate_summary(nlp_aggregate_2019_negative)
nlp_2019_negative

['gets boring after like an hour',
 'This game is really boring.',
 'Waste of my hard earned cash.',
 'it is SUPER boring holy hell',
 'worst game on steam.']

#### NLP 2020 positives

In [44]:
mns_2020_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2020_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [45]:
nlp_initial_2020_positive = [mns_2020_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2020_positive)/50))]
nlp_aggregate_2020_positive = []

for i in range(len(nlp_initial_2020_positive)):
    try:
        nlp_aggregate_2020_positive += generate_summary(nlp_initial_2020_positive[i])
    except:
        continue

  return 1 - (numpy.dot(u, v) / (sqrt(numpy.dot(u, u)) * sqrt(numpy.dot(v, v))))
  return 1 - (numpy.dot(u, v) / (sqrt(numpy.dot(u, u)) * sqrt(numpy.dot(v, v))))
  return 1 - (numpy.dot(u, v) / (sqrt(numpy.dot(u, u)) * sqrt(numpy.dot(v, v))))


In [46]:
nlp_aggregate_2020_positive

['Fun game to play with friends',
 'great game to kill some time.',
 'THE BEST EXPLORATION RPG EVER',
 'such a wonderfull game',
 'great game, well worth playing',
 'its good i like very nice',
 'I recommend this game.',
 'Very good game, i cant stop',
 'Good game lots of exploration.',
 'I love Space games like this',
 'really good open world game',
 'Its a good Space-Adventure',
 'dis geam great very fun friend',
 'very good fun game',
 'one of my most favorite game\n',
 'Getting very nice this game.',
 'Kinda like a pretty good game',
 'Great game easy to play',
 'Sweet game, great to chill',
 'it is a great game\n',
 'very lil time consuming game',
 'Great game to play on VR',
 'yes womans land is pretty good',
 'A Good story line, endless fun',
 'Cool game i guess',
 'Good game to just relax with.',
 'Easy to learn, beautiful game',
 'awesome tons to do',
 'used to be bad, is now good.',
 'game good alien planets fly',
 'This game is a suprise.',
 'A game that keeps on giving.',
 

In [47]:
nlp_2020_positive = generate_summary(nlp_aggregate_2020_positive)
nlp_2020_positive

['Great game lots to see and do',
 'game is a true game now',
 'So good and only gets better',
 'very fun game to get lost in',
 'This game is very fun Gamers']

#### NLP 2020 negatives

In [48]:
mns_2020_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2020_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2020') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [49]:
nlp_initial_2020_negative = [mns_2020_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2020_negative)/50))]
nlp_aggregate_2020_negative = []

for i in range(len(nlp_initial_2020_negative)):
    try:
        nlp_aggregate_2020_negative += generate_summary(nlp_initial_2020_negative[i])
    except:
        continue

In [50]:
nlp_aggregate_2020_negative

['gets boring after 2 hours',
 'It is a boring game.',
 'game nhu cai lon',
 "CAN'T LOAD INTO FRIENDS GAME",
 'complete and utter shit',
 'Only got shit freighters.',
 'Quite frankly, the game sucks',
 'I hate this fucking tutorial.',
 'n ever mind famix was right',
 "Can't even load the game.",
 'no gameplay on this game',
 'Just not my style of game.',
 'I want a freaking refund steam',
 'This is a game inside of a bug',
 'new crap still boring.',
 'i just cant start the game.',
 'Runs like shit on AMD',
 'it blue screens all time',
 'Game randomly crashes',
 'Go play Astroneer',
 "I don't care anymore",
 'Better than it was',
 'Still not a video game yet.',
 "Can't fly into the sun, 0/10",
 'it is still boring']

In [51]:
nlp_2020_negative = generate_summary(nlp_aggregate_2020_negative)
nlp_2020_negative

['i just cant start the game.',
 "CAN'T LOAD INTO FRIENDS GAME",
 'n ever mind famix was right',
 'no gameplay on this game',
 'Runs like shit on AMD']

#### NLP 2021 positives

In [52]:
mns_2021_positive = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == True)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i]) <= 30:
        mns_2021_positive.append(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == True)]['Reviews'].values[i])

In [53]:
nlp_initial_2021_positive = [mns_2021_positive[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2021_positive)/50))]
nlp_aggregate_2021_positive = []

for i in range(len(nlp_initial_2021_positive)):
    try:
        nlp_aggregate_2021_positive += generate_summary(nlp_initial_2021_positive[i])
    except:
        continue

In [54]:
nlp_aggregate_2021_positive

['Great game lots of new content',
 'Love the big open universe',
 'awesome game. very fun',
 'It never ends and never fails.',
 'just keeps getting better',
 'this game is ruining my life',
 'the game is lit',
 'gucci mane lil mosey is white\n',
 'gud game. fun gamer moment.',
 'yeah its pretty good\n',
 'Love it, get it nerd.',
 'it good game if like space\n',
 'Space Exploration gud & fun',
 "There's a lot here to enjoy!",
 'A lot of fun especially in VR.',
 'its space and there are frogs',
 'Love my pet T-Rexy thing!',
 'So fun! Love the grind <3',
 'THE GAME IS FIXED AND WHOLE!',
 'its a great game',
 'this game is better on pc.',
 'Good game. Would recomend',
 'this game is endless.',
 'The best space game for me.',
 'Space game is fun',
 'crashing but i love the game',
 'space or something',
 'the game good rely good',
 'Noice Game ( SagAmire ?)',
 'IT DO BE GREAT GAM NGL',
 "It's a really nice game.",
 'Great game with new Content .\n',
 'Pretty fun game',
 'It is so great and 

In [55]:
nlp_2021_positive = generate_summary(nlp_aggregate_2021_positive)
nlp_2021_positive

['Endless hours of a good game',
 'yeah is good i recommend',
 'the most fun boring game ever',
 'this game is better on pc.',
 'crashing but i love the game']

#### NLP 2021 negatives

In [56]:
mns_2021_negative = []

for i in range(len(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == False)])):
    if 10 <= len(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i]) <= 30:
        mns_2021_negative.append(mnsdata.loc[(mnsdata['Yearly'] == '2021') & (mnsdata['Recommended?'] == False)]['Reviews'].values[i])

In [57]:
nlp_initial_2021_negative = [mns_2021_negative[(i*50):(i+1)*50] for i in range(math.ceil(len(mns_2021_negative)/50))]
nlp_aggregate_2021_negative = []

for i in range(len(nlp_initial_2021_negative)):
    try:
        nlp_aggregate_2021_negative += generate_summary(nlp_initial_2021_negative[i])
    except:
        continue

In [58]:
nlp_aggregate_2021_negative

['dont buy this game',
 "No man's play this game.",
 'Cant reach stars, bad game',
 "It's a refund sim",
 'This is NOT a COOP game!',
 'Sean your are a lying bitсh!',
 'still dont like the game sorry',
 'The monitor cannot be rotated.',
 "Game doesn't load anymore",
 "CAN'T EVEN RUN THE GAME."]

In [59]:
nlp_2021_negative = generate_summary(nlp_aggregate_2021_negative)
nlp_2021_negative

['Sean your are a lying bitсh!',
 'The monitor cannot be rotated.',
 "CAN'T EVEN RUN THE GAME.",
 "Game doesn't load anymore",
 "No man's play this game."]

The error handling exists because there are cases where even a batch (50) of reviews would not have anything in common, therefore it returns an error. Rather stepping in to see if it could be salvaged, it is more optimal to be skipped instead.

Even after that, there may be a few reviews that do not fit the right category or they are low-effort trolling reviews. The reviews are then individually removed, and the batches of reviews are then recompiled.

That should do it! Now it is time for a new dataframe built on our refined reviews.

### NLP dataframe

In [60]:
year_data = 2*[i for i in range(2016, 2022)]
nlpdata = pd.DataFrame(year_data, columns = ['Year'])
nlpdata

Unnamed: 0,Year
0,2016
1,2017
2,2018
3,2019
4,2020
5,2021
6,2016
7,2017
8,2018
9,2019


In [61]:
nlp_2016_positive_string = '. '.join([i for i in nlp_2016_positive])
nlp_2017_positive_string = '. '.join([i for i in nlp_2017_positive])
nlp_2018_positive_string = '. '.join([i for i in nlp_2018_positive])
nlp_2019_positive_string = '. '.join([i for i in nlp_2019_positive])
nlp_2020_positive_string = '. '.join([i for i in nlp_2020_positive])
nlp_2021_positive_string = '. '.join([i for i in nlp_2021_positive])

In [62]:
nlp_2016_negative_string = '. '.join([i for i in nlp_2016_negative])
nlp_2017_negative_string = '. '.join([i for i in nlp_2017_negative])
nlp_2018_negative_string = '. '.join([i for i in nlp_2018_negative])
nlp_2019_negative_string = '. '.join([i for i in nlp_2019_negative])
nlp_2020_negative_string = '. '.join([i for i in nlp_2020_negative])
nlp_2021_negative_string = '. '.join([i for i in nlp_2021_negative])

In [63]:
nlp_positives = [nlp_2016_positive_string, nlp_2017_positive_string, nlp_2018_positive_string, nlp_2019_positive_string, nlp_2020_positive_string, nlp_2021_positive_string]
nlp_negatives = [nlp_2016_negative_string, nlp_2017_negative_string, nlp_2018_negative_string, nlp_2019_negative_string, nlp_2020_negative_string, nlp_2021_negative_string]

In [64]:
positive_review_count = [len(mnsdata.loc[(mnsdata['Yearly'] == f'{i}') & (mnsdata['Recommended?'] == True)]['Reviews']) for i in range(2016, 2022)]
negative_review_count = [len(mnsdata.loc[(mnsdata['Yearly'] == f'{i}') & (mnsdata['Recommended?'] == True)]['Reviews']) for i in range(2016, 2022)]

In [65]:
nlpdata['Reviews'] = nlp_positives+nlp_negatives
nlpdata['Category'] = ['Positive']*6 + ['Negative']*6
nlpdata['Count'] = positive_review_count + negative_review_count
nlpdata

Unnamed: 0,Year,Reviews,Category,Count
0,2016,best game ive played in years.. i love the gam...,Positive,14545
1,2017,me dude its great now. Better than a year ago....,Positive,5641
2,2018,now ist a real game. its better now i guess. I...,Positive,14430
3,2019,The game is great now.. ITS EN REALY GOED GAME...,Positive,12779
4,2020,Great game lots to see and do. game is a true ...,Positive,28361
5,2021,Endless hours of a good game. yeah is good i r...,Positive,8162
6,2016,I DEMAND A REFUND N THIS GAME.. WHY WONT IT L...,Negative,14545
7,2017,how do i get a refund. Wish i could get a refu...,Negative,5641
8,2018,Game runs like shit. um game is legit broken....,Negative,14430
9,2019,gets boring after like an hour. This game is r...,Negative,12779


The dataframe is structured this way because of how the storyboard will be set up.

### Datasheet export

In [66]:
nlpdata.to_csv('nomansskynlp.csv')

## Credits:
Immense gratitude to <b>Forerunners</b> for their code.

Links to their accounts can be found here: <br>
https://www.youtube.com/channel/UCOv8xYSnv27f-z5SiKLYajQ <br>
https://twitter.com/Forerunners_PS