Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fat-Tail Distribution in the VPN Market and its Implications for Long-Term Lokinet Monetization Strategies #56

Open
venezuela01 opened this issue Aug 30, 2023 · 2 comments

Comments

@venezuela01
Copy link

venezuela01 commented Aug 30, 2023

Fat-Tail Distribution in the VPN Market and its Implications for Long-Term Lokinet Monetization Strategies

The VPN industry, valued at approximately $50 billion, presents Lokinet with an opportunity to revolutionize the sector by significantly reducing metadata leaks and enhancing user privacy. This analysis aims to guide Lokinet's long-term monetization strategies.

Measuring Market Share Distribution

To gauge market share distribution in the VPN industry, we use the rank vs frequency distribution as a proxy. In this framework, frequency denotes the recent download counts of Android VPN apps.

Data Collection

Data on recent download frequencies of VPN Android apps was acquired from AppBrain's estimatedRecentDownloads via its API. The search term VPN on the AppBrain website generated over 9,400 results, but due to API limitations, our data collection is restricted to only the first 550 results.

Data Analysis and Model Fitting

Upon plotting the data on a log-log scale, a power-law relationship emerged. Fitting the data to a power function yielded the model:

estimatedRecentDownloads = 87356508 * ranking ^ -1.259

vpn_market_distribution

The model's R-square value of 0.986 suggests a robust fit.

Implications of Fat-Tail Distribution

This model adheres to a discrete form of the power law, also known as the generalized Zipf Distribution. The exponent in this generalized Zipf Distribution, -1.259, signifies a fat-tailed distribution. The closer the exponent is to -1, the fatter the tail. As this exponent approaches -1, the generalized Zipf distribution transforms into a harmonic series with a divergent sum, contradicting to the 80/20 rule by allowing the tail to outweigh the head. Similar trends are often evident in search query distributions.

In other words, in the VPN market, the tail is as significant as the head. This contrasts with many other industries that follow the 80/20 rule, but aligns with patterns commonly observed in the search engine sector. Note that in the data collection step, we ranked VPN Android apps by recent downloads; in our case, 'head' refers to major players, while 'tail' represents smaller VPN service providers.

VPN Market Characteristics

We offer an explanation for the fat-tail phenomenon in the VPN market:

Due to low entry barriers and highly diverse user demands influenced by factors like location, language, target regions, and various marketing channels, the VPN market demonstrates both a decentralized structure and a fat-tail distribution. This contrasts with many other industries, where a handful of top players dominate the market.

Strategic Insights for Lokinet

Coincidentally, the fat-tail distribution in the VPN market aligns well with Lokinet's decentralized ethos and anti-censorship mission. Whether approached from a revenue-generation perspective or a value-proposition angle, the optimal business strategy should target both the head and the tail of the market.

Proposed Long-term Lokinet Monetization Strategy

Considering the fat-tail phenomenon in the VPN market, to minimize marketplace leakage, we advocate a nominal fee-based business model for Lokinet, preferably speed-based pricing. To maintain low entry barriers, broaden Lokinet's user base and increase Lokinet's anonymity set, we recommend offering free access at limited speeds, with tiered pricing for faster service.

As a side note, as Lokinet's ecosystem grows, the usage of Lokinet's darknet could eventually rival the volume of clearnet traffic routed through Lokinet Exit nodes. One advantage of implementing a speed-based pricing model is that it allows both Lokinet Exit and Lokinet darknet to operate under a unified business model.

This long-term vision does not contradict short-to-mid-term strategies such as Enhancing the Awareness, Utility, and Anonymity Set of Oxen - Part 11 - Promoting Lokinet Exit Marketplace through Preview Generation and In-app Navigation of Clearnet URLs. It is common to bootstrap a business with one model while gradually migrating or expanding to another model.

The strategy entails cryptographic enforcement at the Lokinet protocol level and is most effective when payments are transacted in privacy-centric cryptocurrencies like Oxen or Monero. Technical execution specifics, however, are outside the scope of this analysis.

@venezuela01
Copy link
Author

Script for Data Collection

The following script fetches the referenced data in our research, sorted by RECENT_DOWNLOADS:

API_URL="https://api.appbrain.com/v2/info/search"
API_KEY="<api_key>"
QUERY="vpn"
SORT="RECENT_DOWNLOADS"
LIMIT="50"

for OFFSET in {0..500..50}; do
    FILENAME="${OFFSET}.json"
    curl -X GET --header 'Accept: application/json' "${API_URL}?apikey=${API_KEY}&query=${QUERY}&sort=${SORT}&offset=${OFFSET}&limit=${LIMIT}" > "$FILENAME"
    sleep 1  # One-second pause to avoid rate limiting
done

@venezuela01
Copy link
Author

venezuela01 commented Aug 30, 2023

The power function model and chart referenced were generated using the following code:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import plotly.graph_objects as go

pd.options.display.max_columns = None

# 1. Load the dataframe
df = pd.read_csv('merged_data.csv')

# 2. Sort the dataframe by `estimatedRecentDownloads`
df_clean = df.sort_values(by='estimatedRecentDownloads', ascending=False).reset_index(drop=True)

# 3. Create a new column called `ranking`
df_clean['ranking'] = range(1, len(df_clean) + 1)


x_data = np.log(df_clean['ranking'].values)
y_data = np.log(df_clean['estimatedRecentDownloads'].values)

# 5. Apply linear regression to transformed data
reg = LinearRegression().fit(x_data.reshape(-1, 1), y_data)

# Extracting parameters A and B
alpha = reg.coef_[0]
K = np.exp(reg.intercept_)

# 6. Convert the linear regression line back to the power function on original scale
x_original = df_clean['ranking'].values
y_pred = K * x_original ** alpha
print(f'estimatedRecentDownloads = {K} * ranking ^ {alpha}')

r2 = r2_score(np.log(y_pred), y_data)  # R^2 score on log-transformed data
print(f'R^2 of the fit: {r2}')

# Combine the statements into one string
annotation_text = (f'estimatedRecentDownloads = {K:.4f} * ranking ^ {alpha:.4f}'
                   f'<br><br>R^2 of the fit: {r2:.4f}')

# 7. Plot with Plotly on a log-log scale
trace1 = go.Scatter(x=x_original, y= df_clean['estimatedRecentDownloads'].values, mode='markers', name='Original Data')
trace2 = go.Scatter(x=x_original, y=y_pred, mode='lines', name='Fitted Curve')

layout = go.Layout(title="Esitmated Recent Downloads of Android VPN apps (Log-Log scale)",
                   xaxis=dict(title='Ranking', type='log'),
                   yaxis=dict(title='Estimated Recent Downloads', type='log'),
                  annotations=[
        dict(
            x=0.95,
            y=0.95,
            xref='paper',
            yref='paper',
            text=annotation_text,
            showarrow=False,
            align='right'    # Align text to the right

        )
    ])

fig = go.Figure(data=[trace1, trace2], layout=layout)
fig.show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant