**Overview**

This kernel provides a general review and discussion of the debate surrounding Big Data and well-being. We ask four main questions: Is Big Data very new or very old? How well can we now predict individual and aggregate well-being with Big Data, and to what extent do
novel measurement tools complement surveybased measures? Is Big Data responsible for the rising interest in well-being or a threat to it?<br>
What are the economic and societal consequences of Big Data, and is there a point to government regulation of ownership, access, and consent?<br>

Academic articles and books on these developments are now plentiful. The term used to describe this data explosion and its
Big Brother type uses, “Big Data”, was cited 40,000 times in 2017 in Google Scholar, about as often as “happiness”! This data explosion was accompanied by the rise of statistical techniques coming from the field of computer science, in particular machine learning. The later provided methods to analyse and exploit these large datasets for prediction purposes, justifying the accumulation of increasingly large and detailed data.The term Big Data in this chapter will refer to large datasets that contain multiple observations
of individuals.4 Of particular interest is the data gathered on individuals without their “considered consent”. This will include all forms of data that one could gather, if determined, about others without their knowledge, such as visual information
and basic demographic and behavioural characteristics. Other examples are Twitter, public Facebook posts, the luminescence of
homes, property, etc.Is this information used to say something about well-being, ie Life Satisfaction? How could it be used to affect well-being? And how should it be used? These question concerning Big Data and Well-being - where are we, where could we go, and where should we go - will be explored in this chapter.<br>

In [None]:
from IPython.display import Image
import os
!ls ../input/

Image("../input/imageshappiness1/BigData.png")

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns  
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

In [None]:
data2015=pd.read_csv('../input/world-happiness/2015.csv')
#data2016=pd.read_csv('../input/2016.csv')
#data2017=pd.read_csv('../input/2017.csv')

**What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?**

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer.

If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.

In [None]:
print( data2015.info() )

#print( data2016.info() )
#since they all have same info()
#print( data2017.info() )



**About The Dataset**

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.<br>



In [None]:
data2015.head()

In [None]:
print(" 2015 Correlation of data ")
data2015.corr() 

#print(" 2016 Correlation of data ")
#print( data2016.corr() )

#print(" 2017 Correlation of data ")
#print( data2017.corr() )

In [None]:
f,ax = plt.subplots(figsize=(18, 18))
sns.heatmap(data2015.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)
plt.show()

In short, adolescents who spend more time on electronic devices are less happy, and adolescents who spend more time on most other activities are happier. This creates the possibility that iGen adolescents are less happy because their increased
time on digital media has displaced time that previous generations spent on non-screen activities linked to happiness. In other words,
digital media may have an indirect effect on happiness as it displaces time that could be otherwise spent on more beneficial activities.<br>



In [None]:
from IPython.display import Image
import os
!ls ../input/

Image("../input/imageshappiness1/happy.png")

In [None]:
data2015.head()

**About The Columns**

**Country** Name of the country.<br>
**Region** Region the country belongs to.<br>
**Happiness Rank** Rank of the country based on the Happiness Score.<br>
**Happiness Score** A metric measured in 2015 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest."<br>
**Standard Error** The standard error of the happiness score.<br>
**Economy** (GDP per Capita)The extent to which GDP contributes to the calculation of the Happiness Score.<br>
**Family** The extent to which Family contributes to the calculation of the Happiness Score<br>
**Health (Life Expectancy)** The extent to which Life expectancy contributed to the calculation of the Happiness Score<br>
**Freedom** The extent to which Freedom contributed to the calculation of the Happiness Score.<br>
**Trust** (Government Corruption)The extent to which Perception of Corruption contributes to Happiness Score.<br>
**Generosity** The extent to which Generosity contributed to the calculation of the Happiness Score.<br>
**Dystopia Residual** The extent to which Dystopia Residual contributed to the calculation of the Happiness Score.<br>

In [None]:
from IPython.display import Image
import os
!ls ../input/

Image("../input/imageshappiness1/Happiness.png")

In [None]:
data2015.tail()

In [None]:
for each in data2015.columns:
    print( each )

In [None]:

#try to set index to dataframe
fig, axes = plt.subplots(figsize=(10, 10),nrows=2, ncols=2)

data_updated=data2015.rename( index=str ,columns={"Happiness Rank":"Happiness_Rank","Standard Error":"Standard_Error"})
data_2015U=data_updated.rename( index=str ,columns={"Happiness Score":"Happiness_Score"})
data_2015U=data_2015U.rename( index=str,columns={"Economy (GDP per Capita)":"Economy","Dystopia Residual":"Dystopia_Residual","Health (Life Expectancy)":"Health","Trust (Government Corruption)":"Trust"})

data_2015U.sort_values(by=['Happiness_Score'])
#print(data_2015U.loc[:,['Country','Happiness_Score']])
plt.legend(loc='upper right') 

data_2015U=data_2015U.set_index('Happiness_Score')
data_2015U.Standard_Error.plot(ax=axes[0,0],kind = 'line', color = 'red',title = 'Happiness Score',linewidth=1,grid = True,linestyle = ':')
data_2015U.Family.plot( ax=axes[0,1],kind='line' ,color='green' ,title='Family' ,linewidth=1 , grid=True ,linestyle=':' )
data_2015U.Economy.plot( ax=axes[1,0],kind='line' ,color='yellow', title='Economy',linewidth=1,grid=True ,linestyle=':' )
data_2015U.Health.plot( ax=axes[1,1],kind='line' ,color='blue', title='Health',linewidth=1,grid=True ,linestyle=':' )

    # legend = puts label into plot
              # label = name of label

          # title = title of plot


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns  
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

data2015=pd.read_csv('../input/world-happiness/2015.csv')

#fig, axes = plt.subplots(figsize=(10, 10),nrows=2, ncols=2)

data_updated=data2015.rename( index=str ,columns={"Happiness Rank":"Happiness_Rank"})
data_2015U=data_updated.rename( index=str ,columns={"Happiness Score":"Happiness_Score"})
data_2015U=data_2015U.rename( index=str,columns={"Economy (GDP per Capita)":"Economy","Health (Life Expectancy)":"Health","Trust (Government Corruption)":"Trust"})


f,ax = plt.subplots(figsize=(30, 30))

Western_Europe=data_2015U[ data_2015U.Region=='Western Europe']
North_America=data_2015U[ data_2015U.Region=='North America']
Australian_New_Zealand=data_2015U[ data_2015U.Region=='Australia and New Zealand']
Middle_East_and_Northern_Africa=data_2015U[ data_2015U.Region=='Middle East and Northern Africa']
Latin_America_and_Caribbean=data_2015U[ data_2015U.Region=='Latin America and Caribbean']
Southeastern_Asia=data_2015U[ data_2015U.Region=='Southeastern Asia']
Central_and_Eastern_Europe=data_2015U[ data_2015U.Region=='Central and Eastern Europe']
Eastern_Asia=data_2015U[ data_2015U.Region=='Eastern_Asia']
#Sub_Saharan_Africa=data_2015U[ data_2015U.Region=='Sub Saharan Africa']
Southern_Asia=data_2015U[ data_2015U.Region=='Southern Asia']


for each in range(0,len(Central_and_Eastern_Europe.Country)):
    x = Central_and_Eastern_Europe.Happiness_Score[each]
    y = Central_and_Eastern_Europe.Freedom[each]    
    plt.scatter( Central_and_Eastern_Europe.Happiness_Score,Central_and_Eastern_Europe.Freedom,color='magenta',linewidth=1)
    plt.text(x, y,Central_and_Eastern_Europe.Country[each], fontsize=15)


for each in range(0,len(Southern_Asia.Country)):
    x = Southern_Asia.Happiness_Score[each]
    y = Southern_Asia.Freedom[each]    
    plt.scatter( Southern_Asia.Happiness_Score,Southern_Asia.Freedom,color='yellow',linewidth=1)
    plt.text(x, y,Southern_Asia.Country[each], fontsize=15)
    
for each in range(0,len(Western_Europe.Country)):
    x = Western_Europe.Happiness_Score[each]
    y = Western_Europe.Freedom[each]    
    plt.scatter( Western_Europe.Happiness_Score,Western_Europe.Freedom,color='red',linewidth=1)
    plt.text(x, y, Western_Europe.Country[each], fontsize=15)
    
for each in range(0,len(North_America.Country)):
    x = North_America.Happiness_Score[each]
    y = North_America.Freedom[each]    
    plt.scatter( North_America.Happiness_Score,North_America.Freedom,color='blue',linewidth=1)
    plt.text(x, y, North_America.Country[each], fontsize=15)

    
for each in range(0,len( Middle_East_and_Northern_Africa.Country)):
    x =Middle_East_and_Northern_Africa.Happiness_Score[each]
    y =Middle_East_and_Northern_Africa.Freedom[each]    
    plt.scatter(  Middle_East_and_Northern_Africa.Happiness_Score, Middle_East_and_Northern_Africa.Freedom,color='purple',linewidth=1)
    plt.text(x, y,  Middle_East_and_Northern_Africa.Country[each], fontsize=15)

plt.title("Happiness Score-Freedom Scatter Plot")
plt.xlabel("Happiness Score",fontsize=20)
plt.ylabel("Freedom",fontsize=20)







In [None]:
melted = pd.melt(frame=data_2015U,id_vars = 'Country', value_vars= ['Generosity','Dystopia_Residual'])
melted.loc[:10]

In [None]:
data_2015U1=data_2015U.head()
data_2015U2=data_2015U.tail()

concat_data_row=pd.concat([data_2015U1,data_2015U2],axis=0,ignore_index=True)

concat_data_row

In [None]:
data1 = data_2015U.loc[:,["Health","Trust","Freedom"]]
data1.plot()

In [None]:
data1.plot(subplots = True)
plt.show()

In [None]:

fig, axes = plt.subplots(nrows=2, ncols=2)

data_2015U.plot(ax=axes[0,0],kind = "scatter",x="Happiness_Score",y="Freedom",color="blue")
data_2015U.plot(ax=axes[0,1],kind = "scatter",x="Happiness_Score",y="Family",color="red")
data_2015U.plot(ax=axes[1,0],kind = "scatter",x="Happiness_Score",y="Economy",color="yellow")
data_2015U.plot(ax=axes[1,1],kind = "scatter",x="Happiness_Score",y="Generosity",color="pink")

In [None]:
fig, axes = plt.subplots(nrows=2,ncols=1)
data_2015U.plot(kind = "hist",y = "Happiness_Score",bins = 50,range= (0,250),normed = True,ax = axes[0])
data_2015U.plot(kind = "hist",y = "Happiness_Score",bins = 50,range= (0,250),normed = True,ax = axes[1],cumulative = True)
#plt.savefig('graph.png')
plt