# Programming Project - Unit 3,3
*by Igor A. Brandão and Leandro Antonio Feliciano da Silva*

**Goals**
The purpose of this project is explore the following:

- Access Health Graph API - Runkeeper content;
- Full content of the statistical part seen in the course;
- Graphs generation;
- Geolocation analysis and hypotheses should be explained in detail;
- Web scraping.

<hr>

# Global Imports section

Import the necessary libraries to handle 

- Geocoding;
- Maps;
- File input;
- Heatmap;
- Bokeh charts;
- Numpy library;
- Tqdm progress bar
- Requests;
- urlopen;
- HTTPError;
- BeautifulSoup
- Regular expression

In [None]:
### Library necessary to run this IPython Notebook
!pip install geocoder
!pip install folium
!pip install tqdm
!pip install tabulate
!pip install pandas-datareader

In [1]:
# Import pandas
import pandas as pd

# Import google geocoder
import geocoder as gc

# Import numpy library
import numpy as np

# Import folium heatmap
import folium
from folium.plugins import HeatMap

# Import tqdm progressing bar plugin
from tqdm import tqdm

# Import bokeh libraries
from bokeh.plotting import figure
from bokeh.charts import Bar, Histogram, Donut, BoxPlot, Line, output_notebook, show
from bokeh.layouts import row, gridplot, column
from bokeh.models import HoverTool
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
from bokeh.models import HoverTool
from bokeh.charts.attributes import AttrSpec, ColorAttr, MarkerAttr, CatAttr
from bokeh.plotting import ColumnDataSource

# Import request libraries
from urllib.request import Request, urlopen
from urllib.error import HTTPError

# Import web scraping libraries
from bs4 import BeautifulSoup
import re # regular expression

# Import API libraries
import requests
import json
from pandas.io.json import json_normalize

# Imports to output the result as a Markdown
from IPython.display import display, Markdown
from tabulate import tabulate

<hr>

# I - API section

## API data retrieving

#### In the cell bellow, we perform a connection with Health Graph API - Runkeeper.

In [2]:
# Access token
ACCESS_TOKEN = '25bc30d6dd6f4b99bbeb48e8619103b4'

# Base URI
api_URI = "http://api.runkeeper.com/fitnessActivities"

# Number of results
pageSize = 300

# Final URI
url = '%s?pageSize=%s&access_token=%s' % \
    (api_URI, pageSize, ACCESS_TOKEN,)

# print(url)

# Receive the results from API
api_content = requests.get(url).json()

print(json.dumps(api_content, indent=1))

{
 "size": 245,
 "items": [
  {
   "utc_offset": -3,
   "duration": 1037,
   "start_time": "Fri, 23 Jun 2017 12:40:11",
   "total_calories": 148,
   "tracking_mode": "outdoor",
   "total_distance": 5345.75479639154,
   "entry_mode": "API",
   "has_path": true,
   "source": "RunKeeper",
   "type": "Cycling",
   "uri": "/fitnessActivities/1005956310"
  },
  {
   "utc_offset": -3,
   "duration": 1140,
   "start_time": "Fri, 23 Jun 2017 07:50:54",
   "total_calories": 143,
   "tracking_mode": "outdoor",
   "total_distance": 6081.16390011933,
   "entry_mode": "API",
   "has_path": true,
   "source": "RunKeeper",
   "type": "Cycling",
   "uri": "/fitnessActivities/1005843765"
  },
  {
   "utc_offset": -3,
   "duration": 1285,
   "start_time": "Thu, 22 Jun 2017 12:41:16",
   "total_calories": 184,
   "tracking_mode": "outdoor",
   "total_distance": 6589.61628038915,
   "entry_mode": "API",
   "has_path": true,
   "source": "RunKeeper",
   "type": "Cycling",
   "uri": "/fitnessActivities/10055

## JSON to Data Frame conversion

In order to have a better data manipulation, in the next cell we perform a conversion of importe in json format from API to pandas data frame

In [3]:
# Perform a conversion from JSON to Data Frame
api_df = json_normalize(api_content['items'])

# Converts the duration from seconds to minutes
api_df['duration'] = api_df['duration']/60
api_df['duration'] = api_df['duration'].round(2);

# Round the distance
api_df['total_distance'] = api_df['total_distance'].round(2);

api_df

Unnamed: 0,duration,entry_mode,has_path,source,start_time,total_calories,total_distance,tracking_mode,type,uri,utc_offset
0,17.28,API,True,RunKeeper,"Fri, 23 Jun 2017 12:40:11",148,5345.75,outdoor,Cycling,/fitnessActivities/1005956310,-3
1,19.00,API,True,RunKeeper,"Fri, 23 Jun 2017 07:50:54",143,6081.16,outdoor,Cycling,/fitnessActivities/1005843765,-3
2,21.42,API,True,RunKeeper,"Thu, 22 Jun 2017 12:41:16",184,6589.62,outdoor,Cycling,/fitnessActivities/1005535586,-3
3,20.58,API,True,RunKeeper,"Thu, 22 Jun 2017 07:37:35",147,6162.94,outdoor,Cycling,/fitnessActivities/1005303953,-3
4,19.30,API,True,RunKeeper,"Tue, 20 Jun 2017 12:35:21",156,5382.67,outdoor,Cycling,/fitnessActivities/1004255363,-3
5,19.88,API,True,RunKeeper,"Tue, 20 Jun 2017 07:30:08",144,6086.99,outdoor,Cycling,/fitnessActivities/1004088987,-3
6,17.02,API,True,RunKeeper,"Mon, 19 Jun 2017 12:36:20",141,5389.12,outdoor,Cycling,/fitnessActivities/1003621881,-3
7,23.08,API,True,RunKeeper,"Mon, 19 Jun 2017 07:43:02",156,6143.74,outdoor,Cycling,/fitnessActivities/1003493993,-3
8,50.43,API,True,RunKeeper,"Sat, 17 Jun 2017 14:47:50",310,8859.43,outdoor,Cycling,/fitnessActivities/1002663916,-3
9,18.17,API,True,RunKeeper,"Wed, 14 Jun 2017 13:26:09",153,5753.60,outdoor,Cycling,/fitnessActivities/1000879327,-3


## Data export [optional]

In order to visualize the data into an excel file, the cell bellow is responsible for exporting the data.

In [210]:
# Export the new dataSet to csv
api_df.to_csv('dataSource.csv', encoding="utf-8")

<hr>

# II - Summary section

#### Here in this section, we'll handle the user profile infos.

In [5]:
# ==========================================================
# User informations request
# ==========================================================

# Base URI
profile_URI = "http://api.runkeeper.com/profile?access_token=" + ACCESS_TOKEN

# Receive the results from API
profile_content = requests.get(profile_URI).json()

# Base URI
weight_URI = "http://api.runkeeper.com/weight?access_token=" + ACCESS_TOKEN

# Receive the results from API
weight_content = requests.get(weight_URI).json()
weight_df = json_normalize(weight_content['items'])

# ==========================================================
# Display informations
# ==========================================================
display(Markdown('# Hello, ' + str(profile_content['name']) + '!'))

# Name
if profile_content['name']:
    display(Markdown('**Your name:** ' + str(profile_content['name'])))

# Birthday
if profile_content['birthday']:
    display(Markdown('**Birthday:** ' + str(profile_content['birthday'])))
    
# Gender
if profile_content['gender']:
    display(Markdown('**Gender:** ' + str(profile_content['gender'])))

# Location
if profile_content['location']:
    display(Markdown('**Location:** ' + str(profile_content['location'])))
    
# Profile
if profile_content['profile']:
    display(Markdown('**Profile:** ' + str(profile_content['profile'])))
    
display(Markdown('<hr>'))

# ==========================================================
# Weight chart
# ==========================================================

# Tools
TOOLS = 'box_zoom,box_select,crosshair,resize,hover,reset,save'

data_weight = weight_df[np.isfinite(weight_df['weight'])].reset_index()
data_weight = data_weight['weight'].sort_index(ascending=False)

# Make a line chart with the dataSet
line = Line(data=data_weight, legend="top_left", background_fill_color="#E8DDCB", 
            ylabel='Weight', tools=TOOLS, plot_width=900)

# Configure visual properties on a plot's title attribute
line.title.text = "Historical weight"
line.title.align = "center"
line.title.text_font_size = "25px"

# Call the output_notebook() 
output_notebook()

# Display the plot
show(line)

# ==========================================================
# Body fat chart
# ==========================================================

data_fat = weight_df[np.isfinite(weight_df['fat_percent'])]
data_fat = data_fat['fat_percent']

# Make a line chart with the dataSet
line = Line(data=data_fat, legend="top_left", background_fill_color="#E8DDCB", 
            ylabel='Fat(%)', tools=TOOLS, plot_width=900)

# Configure visual properties on a plot's title attribute
line.title.text = "Historical fat %"
line.title.align = "center"
line.title.text_font_size = "25px"

# Call the output_notebook() 
output_notebook()

# Display the plot
show(line)

# Hello, Igor!

**Your name:** Igor

**Birthday:** Sun, 26 Apr 1992 00:00:00

**Gender:** M

**Location:** Natal - RN, Brazil

**Profile:** https://runkeeper.com/user/igorbrandao

<hr>

# III - Statistic section

#### Here in this section, we'll handle the statistics infos.

#### The idea is to use the ***top-down analysis***, from the more generic context to the specific one

## 1) Activites summary

In [6]:
# ==========================================================
# Display activites summary
# ==========================================================
display(Markdown('## Activites summary'))

display(Markdown('** Total activites:** ' + str(api_content['size'])))
display(Markdown('** Total duration:** ' + str("%.2f" % api_df['duration'].sum()) + ' minutes' + 
                ' or ' + str("%.0f" % (api_df['duration'].sum()/60)) + ' hours'))
display(Markdown('** Total distance:** ' + str("%.2f" % api_df['total_distance'].sum()) + ' km'))
display(Markdown('** Total calories burned:** ' + str(api_df['total_calories'].sum()) + ' kcal'))

## Activites summary

** Total activites:** 245

** Total duration:** 7134.45 minutes or 119 hours

** Total distance:** 1280447.78 km

** Total calories burned:** 57597 kcal

## 2) Activites overall times

**[Analysis]: ** In this chart we can see the dedicated time by the user *(in hours)* for each activity type.

He/she spent approximately **35 hours** running and about **25** hours for cycling.

In [7]:
# =================================================================================
# Data selection
# =================================================================================

# =================================================================================
# Chart plotting
# =================================================================================

# Tools
TOOLS = 'box_zoom,box_select,crosshair,resize,reset,hover,save'

# Make a bar chart: p
p = Bar(api_df, values='duration', label='type', agg='mean', color='type',
            legend='bottom_right', background_fill_color="#E8DDCB",
            plot_width=750, plot_height=500, tools=TOOLS)

# Set the y and x axis label
p.yaxis.axis_label= 'Activity overall time'
p.xaxis.axis_label= 'Activity type'

# Set hover to bars
hover = p.select(dict(type=HoverTool))
hover.tooltips = [('Tipo de atividade', '@type'),('Average time:',' @height')]

# Configure visual properties on a plot's title attribute
p.title.text = "Overall time by activity type"
p.title.align = "center"
p.title.text_font_size = "25px"

# Call the output_notebook() 
output_notebook()
show(p)

## 3) Activites calories burning

**[Analysis]:** In this chart, we can see that the user spends **more calories running than cycling**.

Besides that, through the bar chart we can see the total calories burned summing. In this case the user burnt about ***33 thousand calories with running against 25000 with cycling***. 

The boxplot chart presents the difference between running and cycling activities in terms of calories use, where **running activities burnt more calories than cycling**.

In [8]:
# =================================================================================
# Data selection
# =================================================================================

# =================================================================================
# Chart plotting
# =================================================================================

# Tools
TOOLS = 'box_zoom,box_select,crosshair,resize,hover,reset,save'

# Make a bar chart: p
bar = Bar(api_df, values='total_calories', label='type', agg='sum', color='type',
            legend='bottom_right', background_fill_color="#E8DDCB",
            tools=TOOLS)

# Make a box plot: unit 1
box = BoxPlot(api_df, values='total_calories', label='type', color='type',
            legend='bottom_right', background_fill_color="#E8DDCB",
            tools=TOOLS)

# Set the y and x axis label
p.yaxis.axis_label= 'Activity total calories'
p.xaxis.axis_label= 'Activity type'

# Set hover to bars
hover = p.select(dict(type=HoverTool))
hover.tooltips = [('Tipo de atividade', '@type'),('Total calories:',' @height')]

# Configure visual properties on a plot's title attribute
bar.title.text = "Total calories burned by activity type"
bar.title.align = "center"
bar.title.text_font_size = "25px"

box.title.text = "Total calories burned by activity type"
box.title.align = "center"
box.title.text_font_size = "25px"

# Create a list containing plots
row1 = [bar,box]

# Create a gridplot using row1 and row2: layout
layout = gridplot([row1],sizing_mode='scale_width', plot_height=750)

# Call the output_notebook() 
output_notebook()
show(layout)

## 4) Activites by period

First of all we need to split the information by period. To achieve that, the idea is apply a group selection by partial string in timestamp column 

## Timestamp split into columns

In order to have a better way to handle the data by period, it'll be necessary split the timestamp column into separate columns

In [9]:
# =================================================================================
# Dataframe timestamp split
# =================================================================================

# Copy the data
data_by_period = api_df.copy()
data_by_period["month"] = 0
data_by_period["month_index"] = 0
data_by_period["year"] = 0

# Fill the years
data_by_period.loc[data_by_period['start_time'].str.contains('2017'), 'year'] = '2017'
data_by_period.loc[data_by_period['start_time'].str.contains('2016'), 'year'] = '2016'
data_by_period.loc[data_by_period['start_time'].str.contains('2015'), 'year'] = '2015'
data_by_period.loc[data_by_period['start_time'].str.contains('2014'), 'year'] = '2014'

# Fill the months
data_by_period.loc[data_by_period['start_time'].str.contains('Jan'), 'month'] = 'Jan'
data_by_period.loc[data_by_period['start_time'].str.contains('Jan'), 'month_index'] = '1'

data_by_period.loc[data_by_period['start_time'].str.contains('Feb'), 'month'] = 'Feb'
data_by_period.loc[data_by_period['start_time'].str.contains('Feb'), 'month_index'] = '2'

data_by_period.loc[data_by_period['start_time'].str.contains('Mar'), 'month'] = 'Mar'
data_by_period.loc[data_by_period['start_time'].str.contains('Mar'), 'month_index'] = '3'

data_by_period.loc[data_by_period['start_time'].str.contains('Apr'), 'month'] = 'Apr'
data_by_period.loc[data_by_period['start_time'].str.contains('Apr'), 'month_index'] = '4'

data_by_period.loc[data_by_period['start_time'].str.contains('May'), 'month'] = 'May'
data_by_period.loc[data_by_period['start_time'].str.contains('May'), 'month_index'] = '5'

data_by_period.loc[data_by_period['start_time'].str.contains('Jun'), 'month'] = 'Jun'
data_by_period.loc[data_by_period['start_time'].str.contains('Jun'), 'month_index'] = '6'

data_by_period.loc[data_by_period['start_time'].str.contains('Jul'), 'month'] = 'Jul'
data_by_period.loc[data_by_period['start_time'].str.contains('Jul'), 'month_index'] = '7'

data_by_period.loc[data_by_period['start_time'].str.contains('Aug'), 'month'] = 'Aug'
data_by_period.loc[data_by_period['start_time'].str.contains('Aug'), 'month_index'] = '8'

data_by_period.loc[data_by_period['start_time'].str.contains('Sep'), 'month'] = 'Sep'
data_by_period.loc[data_by_period['start_time'].str.contains('Sep'), 'month_index'] = '9'

data_by_period.loc[data_by_period['start_time'].str.contains('Oct-'), 'month'] = 'Oct'
data_by_period.loc[data_by_period['start_time'].str.contains('Oct'), 'month_index'] = '10'

data_by_period.loc[data_by_period['start_time'].str.contains('Nov'), 'month'] = 'Nov'
data_by_period.loc[data_by_period['start_time'].str.contains('Nov'), 'month_index'] = '11'

data_by_period.loc[data_by_period['start_time'].str.contains('Dec'), 'month'] = 'Dec'
data_by_period.loc[data_by_period['start_time'].str.contains('Dec'), 'month_index'] = '12'


## Data export [optional]

In order to avoid replacing the timestamp to other columns, we'll export the new dataSet to a csv file. You can skip this operation because the new .csv dataSet is already included in the project

In [214]:
# Export the new dataSet to csv
data_by_period.to_csv('dataSourceByPeriod.csv', encoding="utf-8")

## All years overview

In [10]:
# =================================================================================
# Data import
# =================================================================================

# Import the fertility.csv data: data
data_by_period = pd.read_csv("dataSourceByPeriod.csv", encoding = 'latin2')

# =================================================================================
# Year selection
# =================================================================================

yearList = data_by_period.year.unique()

for idx, val in enumerate(yearList):

    # =================================================================================
    # Data selection
    # =================================================================================

    year = val

    # Filter the activities by year
    activitiesYear = data_by_period[data_by_period["year"] == year].sort_values('month_index')

    # =================================================================================
    # Data count
    # =================================================================================

    # Do some fix in disciplineStatus
    activityType = activitiesYear.copy()
    activityType["Count"] = 0

    # Count the status sum-up
    activityType = pd.DataFrame(activityType.groupby(["type"])['Count'].count()).reset_index()

    # =================================================================================
    # Chart plotting
    # =================================================================================

    # Tools
    TOOLS = 'box_zoom,box_select,crosshair,resize,hover,reset,save'

    barMonth = Bar(activitiesYear, values='total_distance', label=CatAttr(columns=['month'], sort=False), color='month', agg='sum', 
                          legend='top_left', tools=TOOLS)

    barCaloryMonth = Bar(activitiesYear, values='total_calories', label=CatAttr(columns=['month'], sort=False), color='month', agg='sum', 
                          legend='top_left', tools=TOOLS)

    barDurationMonth = Bar(activitiesYear, values='duration', label=CatAttr(columns=['month'], sort=False), color='month', agg='mean', 
                          legend='top_left', tools=TOOLS)

    # Donut chart settings
    donutType = Donut(activityType, label=['type', 'Count'], values='Count',
              text_font_size='14pt', legend='top_left', 
              tools=TOOLS, background_fill_color="#E8DDCB", title='Discipline status', 
              color='type')

    # Set the y and x axis label
    barMonth.yaxis.axis_label= 'Total distance'
    barMonth.xaxis.axis_label= 'Month'

    barCaloryMonth.yaxis.axis_label= 'Total calories'
    barCaloryMonth.xaxis.axis_label= 'Month'

    barDurationMonth.yaxis.axis_label= 'Total duration'
    barDurationMonth.xaxis.axis_label= 'Month'

    # Set hover to bars
    hover = barMonth.select(dict(type=HoverTool))
    hover.tooltips = [('Month', '@x'),('Total distance:',' @height')]

    hover = barCaloryMonth.select(dict(type=HoverTool))
    hover.tooltips = [('Month', '@x'),('Total calories:',' @height')]

    hover = barDurationMonth.select(dict(type=HoverTool))
    hover.tooltips = [('Month', '@x'),('Total calories:',' @height')]

    hover = donutType.select(dict(type=HoverTool))
    hover.tooltips = [('Activity', '@x'),('Count:',' @height')]

    # Configure visual properties on a plot's title attribute
    barMonth.title.text = "Total distance by month in " + str(year)
    barMonth.title.align = "center"
    barMonth.title.text_font_size = "25px"

    barCaloryMonth.title.text = "Total calories by month in " + str(year)
    barCaloryMonth.title.align = "center"
    barCaloryMonth.title.text_font_size = "25px"

    barDurationMonth.title.text = "Average duration by month in " + str(year)
    barDurationMonth.title.align = "center"
    barDurationMonth.title.text_font_size = "25px"

    donutType.title.text = "Activities types in " + str(year)
    donutType.title.align = "center"
    donutType.title.text_font_size = "25px"

    # Create a list containing plots
    row1 = [barMonth, barCaloryMonth]
    row2 = [barDurationMonth, donutType]

    # Create a gridplot using row1 and row2: layout
    layout = gridplot([row1, row2],sizing_mode='scale_width', plot_height=650, plot_width=900)

    # Call the output_notebook()
    display(Markdown('# Activities overview in ' + str(year)))
    display(Markdown('<hr>'))

    output_notebook()
    show(layout)

# Activities overview in 2017

<hr>

# Activities overview in 2016

<hr>

# Activities overview in 2015

<hr>

# Activities overview in 2014

<hr>

# IV - Geolocation section

#### Here in this section, we'll handle the geolocation infos.

## Location import [warning: too heavy]

The cell below perform multiple requests to Runkeep Health Graph API, in order to retrieve all activities in details to get location informations such as: latitude/longitude

In [None]:
# =================================================================================
# Geolocation data
# =================================================================================

# Copy the data
geolocation_data = pd.DataFrame()

# =================================================================================
# API single activity request
# =================================================================================

# Base URI
base_URI = "http://api.runkeeper.com"

limit = 2

# Run through all activites
for idx, row in tqdm(api_df.iterrows()):
    
    # Final activity URI
    activity_url = base_URI + row['uri'] + "?access_token=" + ACCESS_TOKEN

    # Receive the results from API
    activity_content = requests.get(activity_url).json()

    # Perform a conversion from JSON to Data Frame
    activity_df = json_normalize(activity_content['path'])
    
    # Add the activity path data to geolocation
    geolocation_data = pd.concat([geolocation_data, activity_df])
    
geolocation_data

## Data export [optional]

In order to avoid processing the cell above, here we are saving the processed geolocationd data.

In [279]:
# Export the new dataSet to csv
geolocation_data.to_csv('geolocation.csv', encoding="utf-8")

## Geolocation data import [optional]

In [11]:
# Import the fertility.csv data: data
geolocation_data = pd.read_csv("geolocation.csv", encoding = 'latin2')

## 1) Heatmap

In [None]:
# =================================================================================
# Data adjusts
# =================================================================================

# Count the same latitude/longitude occurrences
geodata = geolocation_data.groupby(['longitude', 'latitude']).size().reset_index().rename(columns={0:'count'})

# =================================================================================
# Heatmap generation
# =================================================================================

# Set map center and zoom level
mapc = [-5.788, -35.202]
zoom = 11

# Initialize the coordinates array
coordinates = []

# Add the coordinates to the coordinate
for i in tqdm(range(len(geodata))):
    # eliminate items with'nan' element
    if all(~np.isnan([geodata.ix[i,'latitude'], geodata.ix[i,'longitude'], geodata.ix[i,'count']])):
        coordinates.append([geodata.ix[i,'latitude'], geodata.ix[i,'longitude'], geodata.ix[i,'count']])

# Create map object
htMap = folium.Map(location=mapc, zoom_start=zoom)

# Append the coordinates to the heatMap
HeatMap(coordinates).add_to(htMap)

# Print the heatMap
htMap

## 2) Map markers

In [None]:
# =================================================================================
# Data adjusts
# =================================================================================

# Count the same latitude/longitude occurrences
geodata = geolocation_data.groupby(['longitude', 'latitude']).size().reset_index().rename(columns={0:'count'})

# =================================================================================
# Map markers generation
# =================================================================================

# Set map center and zoom level
mapc = [-5.788, -35.202]
zoom = 11

# For speed purposes
MAX_RECORDS = 100

# Create map object
map_activities = folium.Map(location=mapc, zoom_start=zoom)

# Add the coordinates to the coordinate
for i in tqdm(range(len(geodata))):
    
    if i == MAX_RECORDS:
        break;
    
    # eliminate items with'nan' element
    if all(~np.isnan([geodata.ix[i,'latitude'], geodata.ix[i,'longitude']])):
        folium.Marker(location = [geodata.ix[i,'latitude'],geodata.ix[i,'longitude']]).add_to(map_activities)

# Show the map
map_activities

# V - Web scraping section

#### Here in this section, we'll try to get user friends informations.

## Web scraping functions

In [13]:
# Return an soup Object
def getSoup(url):
    try:
        html = requests.get(url)
    except HTTPError as e:
        return None
    try:
        soup = BeautifulSoup(html.content, 'html.parser')
    except AttributeError as e:
        return None
    return soup

In [14]:
# Remove the dirty part from link
def getLink(url, dirty):
    result = url.split(dirty)
    return result[0]

## Web scraping flow starts here

This cell'll provide the user and its friends basic stats such as: [total activities, total distance and total calories burned]

In [15]:
# Profile
if profile_content:

    # =================================================================================
    # Web scraping
    # =================================================================================
    
    # Get the content from user profile
    soup = getSoup(profile_content['profile'] + '/friends')
    
    # Array with users URI
    friends_URI = []
    friends_name = []
    
    # Get the friends ID
    for item in soup.findAll(class_='usernameLinkNoSpace user-name'):
        
        # Check if it's a link
        if 'href' in item.attrs:
                
            # Check if the link URI has a specifc string
            if '/user/' in str(item.attrs):

                # Append the link URI
                friends_URI.append(item.attrs['href'])
                
                # Append the friend name
                friends_name.append(item.text.strip(' \t\n\r'))
                
    # Friends info array
    friends = []
    temp = []
    
    # Run through friends profile URI array
    for idx, val in enumerate(friends_URI):
    
        # Get the content from profile friends page
        friend_uri = 'https://runkeeper.com' + val
        soup_friend = getSoup(friend_uri)
        
        # Get the friends ID
        for item in soup_friend.findAll('div', id='statsSection'):
            
            if item:

                # Get the stats
                for subitem in item.findAll(class_='statValue'):
                    
                    # Add the stats to a temporary array
                    if subitem.text.strip():
                        temp.append(subitem.text.strip(' \t\n\r'))
                        
                break;
                    
        # Add the stats to friends array
        friends.append(temp)
        temp = []
    
else:
    print('No data available')

In [16]:
# =================================================================================
# Data presentation
# =================================================================================

display(Markdown('# Friends activites summary'))

# Run through friends information
for idx, val in enumerate(friends_name):

    display(Markdown('#  ' + val))
    display(Markdown('**Total activities:** ' + friends[idx][0]))
    display(Markdown('**Total distance:** ' + friends[idx][1] + ' km'))
    display(Markdown('**Total calories burned:** ' + friends[idx][2] + ' kcal'))

# Friends activites summary

#  Igor

**Total activities:** 245

**Total distance:** 1280.45 km

**Total calories burned:** 57597 kcal

#  Leandro Max

**Total activities:** 14

**Total distance:** 31.17 km

**Total calories burned:** 2486 kcal

#  Ivan Alisson

**Total activities:** 24

**Total distance:** 85.96 km

**Total calories burned:** 5304 kcal

#  Diego BrandĂŁo

**Total activities:** 2

**Total distance:** 8.81 km

**Total calories burned:** 653 kcal