**Problem Statement**

In this project, you are a data journalist for a renowned publication. Your editor in chief has given
you access to a database that addresses a range of topics. You will be writing a 500 to 800-word article about the topic of your choosing.

Your editor will curate the best articles from your cohort of data journalists and publish them.

You will be solving for one of the following questions:

● What is a high-level diagnosis of the sector you’re studying?

● What are the key facts people need to know about this sector?

● What are key trends that are worth noting? You are welcome to do a little research to correlate these with trends with events or facts about the sector as long as you remain
objective.

● What recommendations would you have for the sector in question? These need to be based on the database.


In [None]:
# Import pandas for data manipulation
import pandas as pd

# Import numpy for scientific computations
import numpy as np

# Import plotly library for data visualisation
# ---
# `plotly.express` contains plotly.py's core functionality
# ----
#
import plotly.express as px

In [None]:
# Dataset URL (CSV) = https://bit.ly/MobileDataset

#read data

mobile_df = pd.read_csv('https://bit.ly/MobileDataset')
mobile_df.head()

Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016]
0,Internet users (per 100 people),IT.NET.USER.P2,Sub-Saharan Africa,SSF,..,0.068722178,0.128779789,0.233989844,0.359684995,0.501116248,...,3.670158994,5.616740719,6.922015421,9.775497398,12.10872903,14.61288055,17.1216113,19.57576513,22.38805971,..
1,Internet users (per 100 people),IT.NET.USER.P2,Angola,AGO,..,0.000775929,0.005673746,0.018453724,0.071964087,0.105045562,...,1.7,1.9,2.3,2.8,3.1,6.5,8.9,10.2,12.4,..
2,Internet users (per 100 people),IT.NET.USER.P2,Benin,BEN,..,0.0016931,0.024648796,0.047877304,0.154906741,0.225247851,...,1.79,1.85,2.24,3.13,4.148323066,4.5,4.9,6.0,6.787702956,..
3,Internet users (per 100 people),IT.NET.USER.P2,Botswana,BWA,0.064521041,0.15745423,0.30774685,0.602400929,1.122391474,2.902666622,...,5.28,6.25,6.15,6.0,8.0,11.5,15.0,18.5,27.5,..
4,Internet users (per 100 people),IT.NET.USER.P2,Burkina Faso,BFA,..,0.000960519,0.018686353,0.045423484,0.061780293,0.077080169,...,0.75,0.92,1.13,2.4,3.0,3.725034916,9.1,9.4,11.38764617,..


In [None]:
# get column names

mobile_df.columns = mobile_df.columns.str.lower().str.strip().str.replace(" ", "_")
mobile_df.columns

Index(['series_name', 'series_code', 'country_name', 'country_code',
       '1995_[yr1995]', '1996_[yr1996]', '1997_[yr1997]', '1998_[yr1998]',
       '1999_[yr1999]', '2000_[yr2000]', '2001_[yr2001]', '2002_[yr2002]',
       '2003_[yr2003]', '2004_[yr2004]', '2005_[yr2005]', '2006_[yr2006]',
       '2007_[yr2007]', '2008_[yr2008]', '2009_[yr2009]', '2010_[yr2010]',
       '2011_[yr2011]', '2012_[yr2012]', '2013_[yr2013]', '2014_[yr2014]',
       '2015_[yr2015]', '2016_[yr2016]'],
      dtype='object')

In [None]:
# Pay close attention to the units of measure used.

mobile_df.dtypes

series_name      object
series_code      object
country_name     object
country_code     object
1995_[yr1995]    object
1996_[yr1996]    object
1997_[yr1997]    object
1998_[yr1998]    object
1999_[yr1999]    object
2000_[yr2000]    object
2001_[yr2001]    object
2002_[yr2002]    object
2003_[yr2003]    object
2004_[yr2004]    object
2005_[yr2005]    object
2006_[yr2006]    object
2007_[yr2007]    object
2008_[yr2008]    object
2009_[yr2009]    object
2010_[yr2010]    object
2011_[yr2011]    object
2012_[yr2012]    object
2013_[yr2013]    object
2014_[yr2014]    object
2015_[yr2015]    object
2016_[yr2016]    object
dtype: object

Data Analysis

In [None]:
# Clean data by checking for missing values

mobile_df.isnull().sum()

series_name      0
series_code      0
country_name     0
country_code     0
1995_[yr1995]    0
1996_[yr1996]    0
1997_[yr1997]    0
1998_[yr1998]    0
1999_[yr1999]    0
2000_[yr2000]    0
2001_[yr2001]    0
2002_[yr2002]    0
2003_[yr2003]    0
2004_[yr2004]    0
2005_[yr2005]    0
2006_[yr2006]    0
2007_[yr2007]    0
2008_[yr2008]    0
2009_[yr2009]    0
2010_[yr2010]    0
2011_[yr2011]    0
2012_[yr2012]    0
2013_[yr2013]    0
2014_[yr2014]    0
2015_[yr2015]    0
2016_[yr2016]    0
dtype: int64

In [None]:
#check for duplicates
mobile_df.duplicated().sum()

0

In [None]:
#check cleaned data
mobile_df.sample(5)

Unnamed: 0,series_name,series_code,country_name,country_code,1995_[yr1995],1996_[yr1996],1997_[yr1997],1998_[yr1998],1999_[yr1999],2000_[yr2000],...,2007_[yr2007],2008_[yr2008],2009_[yr2009],2010_[yr2010],2011_[yr2011],2012_[yr2012],2013_[yr2013],2014_[yr2014],2015_[yr2015],2016_[yr2016]
192,Fixed broadband subscriptions (per 100 people),IT.NET.BBND.P2,Mali,MLI,..,..,..,..,..,..,...,0.025146105,0.04012696,0.060799617,0.057336067,0.058834395,0.029083913,0.019011022,0.018962183,0.021520935,..
347,Mobile cellular subscriptions,IT.CEL.SETS,Gabon,GAB,4000,6800,9500,9694,8891,120000,...,1169000,1300000,1450000.0,1610000.0,2370227.0,2557728.0,2745229.0,2932731,2958082,..
43,Internet users (per 100 people),IT.NET.USER.P2,Swaziland,SWZ,0.00103175,0.050399009,0.088545986,0.096099605,0.4707176,0.926191777,...,4.1,6.85,8.94,11.04,18.13,20.78178258,24.7,27.1,30.38270066,..
406,"ICT service exports (BoP, current US$)",BX.GSR.CCIS.CD,Guinea-Bissau,GNB,..,2060000,2320000,..,..,..,...,3500021.908,5220058.284,16970143.76,24360147.55,28718315.63,13615338.95,19936036.06,..,..,..
283,Fixed telephone subscriptions,IT.MLT.MAIN,Central African Republic,CAF,8385,9704,9814,9563,9860,9468,...,..,..,3561.0,929.0,815.0,823.0,800.0,800,1000,..
