In [1]:
# Notebook-wide definitions.

import time
import json
import numpy

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

# Exploring Relationships Between Sentiment And Updates

This is a grouping of explorations that focus on the role user score plays in the system - seeking to understand what it is influenced by, and any relationships it has with other metrics.

The most basic representation of sentiment is the latest snapshot of the user score, which is the ratio of positive to negative reviews (having deemed user score to be a sound metric). More granular is the userscore at a specific time interval (which must be computed), and lastly is the timeseries of positive/negative reviews (by day for sake of simplicity; most of the pre-prepared metrics refer to days as the smallest unit of time).

## U1

### Aim

Products that receive more updates also receive higher ratings.

### Purpose

This is based on the assumption that in an update, a developer will attempt to expand and improve the experience of the user or increase positive engagement, and therefore there should be a positive correlation between update and score metrics. If this holds true, updates have an obvious impact that can be further explored, however it may be skewed by products that happened to be well recieved upon release. Furthermore, it will not account for opinion that has waned over time.

### Investigation

I will first look at how many games are in each score band, as this may help explain some further results.

I will then take the number of updates and user score for a product and present them in scatter plot. I believe there will be an aggregation of points to the top-right, ergo, games that have been updated the most also with higher ratings.

In [2]:
user_score_distribution = json.load(open('./dumps/user-score-distribution.json'))

x = [item['score'] for item in user_score_distribution]
y = [len(item['review_count']) for item in user_score_distribution]

layout = go.Layout(
    title='Number Of Games Per User Score',
    xaxis=dict(
        title="User Score"
    ),
    yaxis=dict(
        title="Total Products"
    ),
    showlegend=False
)

data = [go.Bar(
    x=x,
    y=y
)]

fig = go.Figure(data=data, layout=layout)

iplot(fig)

In [3]:
user_score_update_distribution = json.load(open('./dumps/user-score-update-distribution.json'))

x = []
y = []

for item in user_score_update_distribution:
    x.append(item['score'])
    y.append(
        numpy.std(item['update_counts'])
    )
    
layout = go.Layout(
    title='Standard Deviation of Total Updates Per User Score',
    hovermode='closest',
    xaxis=dict(
        title="User Score"
    ),
    yaxis=dict(
        title="Standard Deviation"
    ),
    showlegend=False
)    

data = [go.Bar(
    x=x,
    y=y
)]

fig = go.Figure(data=data, layout=layout)

iplot(fig)


Degrees of freedom <= 0 for slice


invalid value encountered in true_divide


invalid value encountered in double_scalars



In [34]:
user_score_update_count = json.load(open('./dumps/user-score-update-count.json'))

x = []
y = []

for item in user_score_update_count:
    x.append(item['true_user_score'])
    y.append(item['update_count'])
    
layout = go.Layout(
    title='Updates and User Scores',
    hovermode='closest',
    xaxis=dict(
        title="True User Score"
    ),
    yaxis=dict(
        title="No. Updates"
    ),
    showlegend=False
)    

data = [go.Scatter(
    x=x,
    y=y,
    mode = 'markers'
)]

fig = go.Figure(data=data, layout=layout)
    
iplot(fig)

### Results

The pattern is much weaker than I was expecting, however it is still visible; **all games that have more than 124 updates have a user score higher than 50** (and ergo more than half of the users like it). I did not expect there to still be a solid representation of games within each score band having so few updates, however this leads me to suspect that some of these have less reviews behind them, making the user score less useful.

I want to grade the points based on the number of reviews that informed their score, and see how this differs when using 'true score'.

## U2

### Aim

Products that receive more updates also receive higher ratings - graded by review count.

### Purpose

To see if the trend is more apparent in products that have a user score based on more opinions (and therefore more trustworthy).

### Investigation

Using an additional color dimension that is relative to a logarithm of the number of reviews, and further reducing the opacity of those below a threshold. Then, I plot the same chart using score rank instead.

In [35]:
user_score_update_count = json.load(open('./dumps/user-score-update-count.json'))

x = []
y = []
colors = []
opacities = []
tags = []

for item in user_score_update_count:
    x.append(item['true_user_score'])
    y.append(item['update_count'])
    tags.append("Product Id: {0}, Total Reviews: {1}".format(
            item['product_id'],
            item['review_count']
        )
    )
    review_log = numpy.log2(item['review_count'])
    colors.append(
        review_log
    )
    if review_log < 10:
        opacities.append(0.1)
    else:
        opacities.append(1.0)
    
layout = go.Layout(
    title='Updates and User Scores',
    hovermode='closest',
    xaxis=dict(
        title="True User Score"
    ),
    yaxis=dict(
        title="No. Updates"
    ),
    showlegend=False
)

data = [go.Scatter(
    x=x,
    y=y,
    mode='markers',
    marker=dict(
        color=colors,
        opacity=opacities,
        colorscale='Viridis'
    ),
    text=tags
)]

fig = go.Figure(data=data, layout=layout)
    
iplot(fig)

In a future experiment I created the concept of update frequency which is applicable here - looking at the user score as a static snapshot is not necessarily representative. We can see if there is a correlation between final user score and update frequency.

### Results

The pattern suggests that products that are updated more often do receive more positive recommendations. Conversely, unsuccessful products rarely reach more than 100 updates. However, since the standard deviation generally appears to increase as the score increases, this suggests that products do not need to rely on updates to be successful.

When the products are placed in the context of the entire platform (ergo, ranked based on userscore proportional to all userscores), there are no meaningful patterns.

## U3

### Aim

Updates improve a product's user score over time.

### Purpose

In addition to acknowledging a product's latest user score, we can look at whether or not updates improve a score over a period of time - if they do then this may explain the trend above. A product that is inherently flawed may end up with a bad user score, however I want to see if updates can help even games that start poorly.

I think that due to the noise of the dataset there will be many products for which no distinguishable pattern emerges, however there will be a large subset of products for which this does hold true.

### Investigation

The simplest way of measuring this is to look at products for which their initial review score is less than their current review score. This creates a problem in that all products will start out with user scores that fluctuate drastically until a certain threshold of reviews balances them out. Therefore, a reasonable start point to construct a user score must be a timestamp offset by that threshold.

The influence of a review's recommendation at each submission is 1/n+1, and therefore after 20 reviews each review will have a less than 5% bearing on the overall user score.

Therefore, the criteria is products that have received at least one update at a point in time greater than receiving their 20th review and furthermore have received at least one review after that point.

This will ensure that I am only looking at products for which an update has occured after a review score has been settled, and will hopefully provide a sample size large enough to explore. It is flawed as it will ignore products for which reviews may have helped, but opinion has for some reason waned since the last update; this is a separate investigation. It is obvious and less specific in order to incrementally build upon the complexity of the investigation.

I found 897 potential candidates. For these, I then create the review score for each at 20 reviews versus the present, while also pulling in the number of reviews to use later.

In [6]:
user_score_start_end = json.load(open('./dumps/user-score-start-end.json'))

number_increased = 0
number_remained_same = 0
number_decreased = 0

for item in user_score_start_end:
    if item['present_user_score'] > item['start_user_score']:
        number_increased += 1
    elif item['present_user_score'] == item['start_user_score']:
        number_remained_same += 1
    else:
        number_decreased += 1

print("Start User Score versus Present User Score:")
print("Number increased: " + str(number_increased) + " (" + str(round(number_increased / 897 * 100, 2)) + "%)")
print("Number remained the same: " + str(number_remained_same) + " (" + str(round(number_remained_same / 897 * 100, 2)) + "%)")
print("Number decreased: " + str(number_decreased) + " (" + str(round(number_decreased / 897 * 100, 2)) + "%)")

Start User Score versus Present User Score:
Number increased: 156 (17.39%)
Number remained the same: 48 (5.35%)
Number decreased: 693 (77.26%)


The number of products for which their user score has decreased is much larger than I'd expect. Since starting reviews are biased towards positivity, I must utilise the user score at the midpoint (discounting products for which there is no midpoint).

In [7]:
number_increased = 0
number_remained_same = 0
number_decreased = 0

product_ids_increased = []
product_ids_decreased = []
product_ids_remained_same = []

changes = []

increased_percentages = []
decreased_percentages = []

number_skipped = 0

for item in user_score_start_end:
    # Account for products that are too new for this test.
    if item['review_midpoint_time'] <= item['twentieth_review_time']:
        number_skipped += 1
        continue
        
    if item['present_user_score'] > item['score_at_midpoint']:
        number_increased += 1
        product_ids_increased.append(item['product_id'])
        change = item['present_user_score'] - item['score_at_midpoint']
        increased_percentages.append(
            change
        )
        changes.append({
            'product_id': item['product_id'],
            'score_change_midpoint_to_present': change
        })
        
    elif item['present_user_score'] == item['score_at_midpoint']:
        number_remained_same += 1
        product_ids_remained_same.append(item['product_id'])
    else:
        number_decreased += 1
        product_ids_decreased.append(item['product_id'])
        change = item['present_user_score'] - item['score_at_midpoint']
        decreased_percentages.append(
            change
        )
        changes.append({
            'product_id': item['product_id'],
            'score_change_midpoint_to_present': change
        })
        
print("Midpoint User Score versus Present User Score:")
print("Number increased: " + str(number_increased) + " (" + str(round(number_increased / 859 * 100, 2)) + "%)")
print("Number remained the same: " + str(number_remained_same) + " (" + str(round(number_remained_same / 897 * 100, 2)) + "%)")
print("Number decreased: " + str(number_decreased) + " (" + str(round(number_decreased / 859 * 100, 2)) + "%)")

Midpoint User Score versus Present User Score:
Number increased: 146 (17.0%)
Number remained the same: 94 (10.48%)
Number decreased: 619 (72.06%)


This does not improve the number of products that increased from their midpoint to their endpoint - instead, it promotes the number that remain the same. There are several options here to determine why.

- I can look at how much those that increased increased by
- I can look at how much those that decreased decreased by
- I can attempt to see if there are mutual traits present in each group
  - Update frequency
  - Popularity
  - Genre (perhaps too far-reaching here)

In [8]:
print("Average score increase of products whose score increased: " + str(round(numpy.mean(increased_percentages), 2)))
print("Median score increase of products whose score increased: " + str(numpy.median(increased_percentages)))
print("Highest score increase of products whose score increased: " + str(numpy.amax(increased_percentages)))

# Really still 'increase', but for sake of plain language use...
print("Average score decrease of products whose score decreased: " + str(round(numpy.mean(decreased_percentages), 2)))
print("Median score decrease of products whose score decreased: " + str(numpy.median(decreased_percentages)))
print("Highest score decrease of products whose score decreased: " + str(numpy.amin(decreased_percentages)))

Average score increase of products whose score increased: 4.6
Median score increase of products whose score increased: 2.0
Highest score increase of products whose score increased: 35
Average score decrease of products whose score decreased: -6.67
Median score decrease of products whose score decreased: -4.0
Highest score decrease of products whose score decreased: -66


As we can see, from their midpoint in time, most products do not deviate wildly from their score, suggesting that at a sufficient midpoint the review score may be 'settled'. It seems that products that decline in user score more often decline to a greater extent. Note that this is absolute score change - it does not account for games without much room for improvement.

Note at this point that we cannot confirm that the score is a result of updates - that can be assessed using change detection later.

We can compute the update frequency (per four weeks) of products and compare them to these two key subsets.

In [9]:
update_frequencies = json.load(open('./dumps/update-frequencies.json'))

product_ids_increased_frequencies = []
product_ids_decreased_frequencies = []

for update_frequency in update_frequencies:
    if update_frequency['score_change_midpoint_to_present'] > 0:
        product_ids_increased_frequencies.append(update_frequency['update_frequency_per_four_weeks'])
    else:
        product_ids_decreased_frequencies.append(update_frequency['update_frequency_per_four_weeks'])
        
print("Average update frequency of products whose scores increased: " + str(round(numpy.mean(product_ids_increased_frequencies), 2)))
print("Median update frequency of products whose scores increased: " + str(round(numpy.median(product_ids_increased_frequencies), 2)))
print("Max update frequency of products whose scores increased: " + str(round(numpy.amax(product_ids_increased_frequencies), 2)))

print("Average update frequency of products whose scores decreased: " + str(round(numpy.mean(product_ids_decreased_frequencies), 2)))
print("Median update frequency of products whose scores decreased: " + str(round(numpy.median(product_ids_decreased_frequencies), 2)))
print("Max update frequency of products whose scores decreased: " + str(round(numpy.amax(product_ids_decreased_frequencies), 2)))

Average update frequency of products whose scores increased: 2.15
Median update frequency of products whose scores increased: 1.65
Max update frequency of products whose scores increased: 15.6
Average update frequency of products whose scores decreased: 1.85
Median update frequency of products whose scores decreased: 1.24
Max update frequency of products whose scores decreased: 13.07


As we can see, products for which their user score improved over time received more updates than those whose scores decreased. We can graph this to easier see if there is a correlation.

In [10]:
x = [item['update_frequency_per_four_weeks'] for item in update_frequencies]
y = [item['score_change_midpoint_to_present'] for item in update_frequencies]
names = [item['product_id'] for item in update_frequencies]
colors = []

for item in update_frequencies:
    for user_score in user_score_start_end:
        if item['product_id'] == user_score['product_id']:
            colors.append(numpy.log10(user_score['review_count']))
            break

data = [go.Scatter(
    x=x,
    y=y,
    mode='markers',
    marker=dict(
        color=colors,
        colorscale='Viridis',
        colorbar=go.ColorBar(
            title='Popularity'
        ),
    ),
    hovertext=names
)]

layout = go.Layout(
    title='Update Frequency and Score',
    hovermode='closest',
    xaxis=dict(
        title="Update Frequency (per four weeks)"
    ),
    yaxis=dict(
        title="Change in User Score Midpoint to Present"
    ),
    showlegend=False
) 

fig = go.Figure(data=data, layout=layout)
    
iplot(fig)

print(numpy.corrcoef(x=x, y=y))

[[ 1.          0.15371049]
 [ 0.15371049  1.        ]]


We see no correlation across the whole set, however there are extra features to consider. We can see a much greater density of score drops towards the lower end of update frequency. This seems to suggest that products that drop in score drastically do so as a result of lower update frequency.

Furthermore, we have not yet considered products for which there was not much room for improvement - I mapped their midpoint score to a colour gradient, looking to see if most above the line were close to 100, and this was not the case, as many products did also significantly improve in user score over their lifetime. I changed the colour gradient to 'popularity' derived from their total reviews in order to categorise products on the graph.

To attest to noise, we can compute the mean and sum of the set of products that decreased per bucket of change.

In [30]:
x = [item['update_frequency_per_four_weeks'] for item in update_frequencies if item['score_change_midpoint_to_present'] < 0]
y = [item['score_change_midpoint_to_present'] for item in update_frequencies if item['score_change_midpoint_to_present'] < 0]

x2 = []
y2 = []
y3 = []

changes_in_user_score_per_update_frequency = []

for item in update_frequencies:
    if(item['score_change_midpoint_to_present'] > 0):
        continue
        
    # Group the scores into bins to make curve robust.
    uf = item['update_frequency_per_four_weeks']
    uf = uf - (uf % 0.5)
    
    found = False
    for c in changes_in_user_score_per_update_frequency:
        if c['update_frequency'] == uf:
            c['changes'].append(item['score_change_midpoint_to_present'])
            found = True

    if not found:
        changes_in_user_score_per_update_frequency.append({
            'update_frequency': uf,
            'changes': [
                item['score_change_midpoint_to_present']
            ]
        })

changes_in_user_score_per_update_frequency.sort(key=lambda x: x['update_frequency'])
    
for item in changes_in_user_score_per_update_frequency:
    x2.append(item['update_frequency'])
    y2.append(numpy.sum(item['changes']))
        
for item in changes_in_user_score_per_update_frequency:
    y3.append(numpy.mean(item['changes']))
    
data = [go.Scatter(
    x=x,
    y=y,
    mode='markers',
    hovertext=names,
    name="Individual"
),
go.Scatter(
    x=x2,
    y=y2,
    mode='lines',
    line=dict(
        shape='spline'
    ),
    name="Sum changes per update frequency band"
),
go.Scatter(
    x=x2,
    y=y3,
    mode='lines',
    line=dict(
        shape='spline'
    ),
    name="Mean changes per update frequency band"
)]

layout.showlegend=True

fig = go.Figure(data=data, layout=layout)

iplot(fig)

print("Correlation: " + str(numpy.corrcoef(x=x2, y=y2)))

Correlation: [[ 1.        0.681821]
 [ 0.681821  1.      ]]


There is a reasonable correlation visible; it is muddied somewhat by a lack of representation in the data set of updates at higher bands and the influence of outliers, but it is enough to warrant further investigation of explicit timeseries in order to ensure that what is happening is at least in part due to updates.

We build a timeseries graph of some products user scores per day along with some other informative metrics.
- One that has improved over time and into it.
- One that has decreased over time and look into it.
- One that has remained relatively the same.

Furthermore, I can look anecdotally at the contents of the reviews for these investigations to see if my assumptions about the links between updates and the score itself hold true.

It is worth noting that the final user scores and my system's snapshot of users scores do not directly align, and the system providing user scores discounts reviews that were not made on Steam. Users scores up until this explicit and accumulative version have been used as indicators, and what is shown below is something that is unbiased. The reason the system favours reviews made on products that have been acquired through it is a mix of discrediting external sources to promote what it considers more 'legitimate' use. While there can be nefarious means and motivations in the acquisition and review of a product, the majority of users simply participate in 'bundle culture'.

For a one that has improved over time, I selected product '468070', or '90 Minute Fever'. 

In [12]:
def create_timeseries_figure(timeseries_data, product_id):
    x = [time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(item['day_start_time'])) for item in timeseries_data]
    y1 = [item['user_score_snapshot'] for item in timeseries_data]
    y2 = [item['total_updates'] for item in timeseries_data]
    y3 = [item['total_reviews'] for item in timeseries_data]
    y4 = [item['total_reviews_voted_up'] for item in timeseries_data]
    y5 = [item['total_reviews_voted_down'] for item in timeseries_data]

    data = [
    go.Scatter(
        x=x,
        y=y1,
        mode='line',
        name="User Score Snapshot"
    ),
    go.Bar(
        x=x,
        y=y2,
        name="Total Updates",
        xaxis='x',
        yaxis='y2' 
    ),
    go.Bar(
        x=x,
        y=y3,
        name="Total Reviews",
        xaxis='x',
        yaxis='y3'
    )]

    layout = go.Layout(
        title='User Score Timeseries ' + product_id,
        xaxis=dict(
            title="Day Start Time",
            autorange=True,
        ),
        yaxis=dict(
            title="User Score Snapshot",
            range=[
                0, 100
            ],
        ),
        yaxis2=dict(
            title="Total Updates",
            overlaying='y',
            side='right',
            anchor='x'
        ),
        yaxis3=dict(
            title="Total Reviews",
            overlaying='y',
            side='right',
            anchor='x'
        ),
        showlegend=True
    )

    fig = go.Figure(data=data, layout=layout)
    
    return fig

timeseries_368720 = json.load(open('./dumps/timeseries/368720-timeseries.json'))
iplot(create_timeseries_figure(timeseries_368720, "368720"))

We can see that between July 2015 and March 2016, there are clusters of updates shortly followed by spikes in reviews and a positive increase in user scores.

It makes sense that this peters out as time goes on; most users that will ever use the product have already left their reviews, there are less frequent updates and reviews, and the more reviews have been logged, the harder it is to change that score.

For a product whose score did not deviate drastically, we can see a similar scenario. '438100', or 'VR Chat' hit virality around January, 2018, but on the run up to that we can see improvements in the user score demarcated by updates - although this is not the case for every update, most updates are followed by an increase or local peak in user score, and the user score does not change much during the relative lack of updates between May and September, 2017. Updates could still simply coincide with popularity however.

In [13]:
timeseries_438100 = json.load(open('./dumps/timeseries/438100-timeseries.json'))
iplot(create_timeseries_figure(timeseries_438100, "438100"))

In [14]:
timeseries_444090 = json.load(open('./dumps/timeseries/444090-timeseries.json'))
iplot(create_timeseries_figure(timeseries_444090, "444090"))

In the example of a product that dropped in user score over time we can see that '335210', or 'Rift's Cave' jumped in user score at the same time as a number of updates. The score and reviews do not change again until there is another period of updates. After a period of no updtes the games score drops; this may suggest that a lack of developer activity on an unchaging product results in a decrease in scores.

In [15]:
timeseries_335210 = json.load(open('./dumps/timeseries/335210-timeseries.json'))
iplot(create_timeseries_figure(timeseries_335210, "335210"))

### Results

- Most products drop in user score over time.
- Most products do not largely deviate from the user score they have at the midpoint of their release cycle.
- Product updates have the capability of increasing the user score of a product. It appears that as opposed to there being a direct impact per update, clusters of updates produce a boost in positive engagement.
- Frequent updates uphold a user score and prevent it from dropping drastically.
- ~10% of products settled exactly on their 'final' review score midway through their lifecycle.

There are lots of moving parts, and some observations warrant further investigation. Correlation is evident but it is hard to gague the impetus for a specific change without knowing the entire context of a specific product. Some are still too new . This makes me wonder if, given a threshold, we could investigate how long it takes products in general to converge to a score, and would be interesting future work. Is that a metric of time, number of reviews, number of updates, or a combination of both?

## U4

### Aim

Long periods without updates result in lower review sentiment.

### Purpose

We already know that update frequency and total number of updates do not necessarily result in a drop in review score, however the frequency was computed on an average and does not look at patterns over explicit time periods, and there have been suggestions that this is the case in graphs prior.

As a tangent from the previous investigation wherein lots of questions were raised, it seems that there can be observations made about the impact of updates separated from their content, but rather based on their time and propensity. I want to verify that there is a consistent trend for products that have an absence of updates are followed by a drop review sentiment.

There are assumptions that may be made about a product released under more 'standard' circumstances (content-full), however an early-access product relies on changes to push it towards a state that is ready. It may be the case that certain kinds of products do not follow this trend - a linear story game may not need updating, and the motivation for the developers using the platform cannot be easily ascertained, but we have also seen that most products on the platform set themselves up for iteration, so hopefully this will not be a problem. 

### Investigation

I first look at products for which manual review of the contents of user feedback indicates that this will almost certainly be the case, then see if I can cast a larger net and quantify some of this manual investigation.

I scanned the contents of reviews for keywords such as 'abandonware', 'abandoned', 'promises', 'scam', 'updates' etc. in order to find disenfranchised or embittered users. It is worth noting that these exist for most products (you can't please everyone!), however there were some for which there was a greater proportion. Furthermore, I look for games with low review scores and update frequency. 

'15540', or '1... 2... 3... KICK IT!' was one of the first 'early access' products on the platform. According to users, it has been neglected in favour of new business ventures by the development team, and they lament the lack of a full release despite its promising start:

> "No updates for a long long time, developers are working on another game. This game is abandoned...
Was a really nice game until they just f\*cked it up." - Recent negative review

> "It's been almost 2 years and the game is showing some improvement. It's not bad, but still far from great." - Positive review from June 2013

For '463920', or 'Initia: Elemental Arena', users speculate that the game has been abandoned since the primary game mode relies on servers that are no longer running, however there are conflicting reports.

> "I know that this is an Early Access game, but for Early Access to work, the devs need to put in some form of effort to actually develop the game further and get it out of Early Access." - Recent negative review

> "I wish if this game get more developed as soon as possible waiting for more updates." - Positive review, June 2016

For '246880', or 'Recruits', there is a similar story of abandonment and ergo fan betrayal.

> "This game could have been really good if the development would have continued." - Recent negative review

> "I just hope the devs would give a bit more frequently feedback and rework the blood-terrible character animations which they are aware of heh :D Good luck!" - Negative review from Octoer 2014

In [16]:
timeseries_15540 = json.load(open('./dumps/timeseries/15540-timeseries.json'))
iplot(create_timeseries_figure(timeseries_15540, "15540"))

timeseries_463920 = json.load(open('./dumps/timeseries/463920-timeseries.json'))
iplot(create_timeseries_figure(timeseries_463920, "463920"))

timeseries_246880 = json.load(open('./dumps/timeseries/246880-timeseries.json'))
iplot(create_timeseries_figure(timeseries_246880, "246880"))

It is clear that the frequency of updates at a specific point in time is influential to the product's overall sentiment. It seems that products drop in sentiment almost unchecked without developer engagement, but a period of early responsiveness from developers allows sentiment to level out at a higher stage. This responsiveness may be indicative of cultivating a group of invested users (fans), and setting a precedent of trust for the lifecycle itself. In fact, most fans of the product suggest changes, and most users cite updates as being either the cause of success or failure for a product in 'early access'.

To further assess this we can use aspects of the previous investigation but with a new, more focused question. We already know which products dropped in user score and which didn't - we can now ask where the greater representation of 'abandoned' products are.

I posit that most games that have not received updates recently will fall under the bracket of those whose review scores have dropped, and furthermore the derivative of this drop will be more significant in games that have not been updated for longer periods of time. 

It is important to note that there is a difference between'abandoned' and 'finished' when considering product updates as atomic; I expect that some products that have not received many updates recently but still have high review scores will be products for which updates are not imperative, or that the products themselves have, by the point of the last update, already been iterated on to the extent that they are near 'completion'. This can be quite subjective.

First, we add to our dataset the time of last update. We need to choose an initial time to look at; a month (four weeks) seems reasonable. Additionally, we can further categorize some games as those having halted in development, using their review midpoint time as a cap.

In [21]:
last_updates = json.load(open('./dumps/last-updates.json'))

# A month prior to data collection.
data_end_range = 1517011200

# Some high-level facts.
update_ranges = [item['update_range'] for item in last_updates]
update_ranges.sort()

print("The average active period is " + str(round(numpy.mean(update_ranges) / 2419200, 1)) + " (months)")
print("The median active period is " + str(round(numpy.median(update_ranges) / 2419200, 1)) + " (months)")
print("The max active period is " + str(round(numpy.amax(update_ranges) / 2419200, 1)) + " (months)")
print("The min active period is " + str(round(numpy.amin(update_ranges) / 86400, 2)) + " (days)")

products_not_updated_in_a_month = []
products_stopped_development = []

products_stopped_development_update_ranges_and_changes = []

for last_update in last_updates:
    if last_update['last_update_time'] < 1517011200:
        products_not_updated_in_a_month.append(last_update['product_id'])
        
    # Find the product's review midpoint.
    for item in user_score_start_end:
        if last_update['product_id'] == item['product_id']:
            if last_update['last_update_time'] < item['review_midpoint_time']:
                products_stopped_development.append(last_update['product_id'])
                
                # Find the product's score change.
                for item in update_frequencies:
                    if last_update['product_id'] == item['product_id']:
                        products_stopped_development_update_ranges_and_changes.append({
                            'score_change_midpoint_to_present': item['score_change_midpoint_to_present'],
                            'update_range': last_update['update_range']     
                        })
                        break
                break

print("Number of products not updated in a month: " + str(len(products_not_updated_in_a_month)))
print("Number of products that appear to have stopped development: " + str(len(products_stopped_development)))

The average active period is 16.1 (months)
The median active period is 12.8 (months)
The max active period is 70.7 (months)
The min active period is 1.18 (days)
Number of products not updated in a month: 422
Number of products that appear to have stopped development: 142


About half of all of the products in the set have not been updated in the past month, however depending on when they were initially released in the 6 years the section of the platform has been available, and considering the average active period, this is to be expected. Instead, more interesting are the products that appear to have stopped development. When we look at their score changes we can see that they are lower than the overall set's.

In [29]:
# May have to make my own score metric.
score_changes = [item['score_change_midpoint_to_present'] for item in products_stopped_development_update_ranges_and_changes]

percentage_dropped = len([item for item in score_changes if item < 0]) / len(score_changes) * 100

print("Percent of products that stopped development and then dropped in user score: " + str(round(percentage_dropped)) + "%")

print("Average score change of products that appear to have stopped development: " + str(numpy.mean(score_changes)))
print("Median score change of products that appear to have stopped development: " + str(numpy.median(score_changes)))

Percent of products that stopped development and then dropped in user score: 84%
Average score change of products that appear to have stopped development: -8.625
Median score change of products that appear to have stopped development: -7.0


Finally, we can graph the total active development against the score change for products that have stopped development, and see that products with shorter development spans are more prone to greater drops in score.

In [19]:
x = [item['update_range'] / 2419200 for item in update_ranges_and_changes]
y = [item['score_change_midpoint_to_present'] for item in update_ranges_and_changes]

layout = go.Layout(
    title='Active Development Span on Score Change For Products Stopped Development',
    xaxis=dict(
        title="Update Range (Months)"
    ),
    yaxis=dict(
        title="Score Change Midpoint To Present"
    ),
    showlegend=False
)

data = [go.Scatter(
    x=x,
    y=y,
    mode="markers"
)]

fig = go.Figure(data=data, layout=layout)

iplot(fig)

### Results

Long periods without updates do indeed seem to result in lower review sentiment, in addition to products that had a short development period.

Additionally, it seems we can classify products that have completed their development effectively versus products that have been prematurely abandoned by examining the change in their score after they have stopped development. This would be useful in further work looking at the shift of products into a fully released state.