# Welcome

# Challenges & Solutions

## Challenges

## Limitations encountered in Section 1: Downloading data
- The limitations encountered while using Coinmarket API
- The limitations encountered with python web-scraping



### The limitations encountered while using Coinmarketcap API
The instruction for this was "*It's best if you connect to the API, but if this information is not available in their API you could use a python web scraping tool to do the job.*"
Coinmarketcap has a really good API which i signed up for but I am limited to their **basic** subscription. This restricts me from accessing API endpoint which has all the columns I need which is the **"Market pairs"** endpoint, this is stated clearly in their documentation [here](https://coinmarketcap.com/api/documentation/v1/#operation/getV1ExchangeMarketpairsLatest). 
> If you are interested in seeing my attempt and the data available on their **basic plan**, check **my google colab [Click here](https://colab.research.google.com/drive/1Xq3eQlm7mN41oIhZIsQrhEw578ymtQXu?usp=sharing)**

Next, I tried getting their API through **RapidAPI API's Hub**, Even though I found one [Click here](https://rapidapi.com/community/api/coinbase/), unfortunately it does not have the data endpoint needed to populate my columns for this project.

### The limitations encountered with python web-scraping
Python's Selenium and Beautifulsoup was used for webscraping. I used three methods to approach this, there was limitations to each method:
1. using only beautifulsoup library
2. using only selenium library
3. using the combination of selenium and beautifulsoup library.

Now, I will show some codes

### 1. Using only beautifulsoup library.

---
The limitation for this method was beautifulsoup could not scrape market table, resulting in a None result. 

**Reasons the scraping possibly failed;**
- the page's table is dynamic, beautifulsoup cannot work with dynamic pages 
- the page has too much underlyting javascript, beautifulsoup cannot work in such pages

Now, I will show some codes

In [2]:
# importing libraries
from bs4 import BeautifulSoup
import requests

# Assign bitcoin market link to a variable
link = 'https://coinmarketcap.com/currencies/bitcoin/markets/'

# use request to get the raw HTML content from the website
HTML_script = requests.get(link)

In [3]:
# The HTTP 200 OK success status response code indicates that the request has succeeded 
HTML_script

<Response [200]>

In [4]:
# now to extract the text from the HTML_script
text_script = HTML_script.text

# where 'html.parser' is a processing tool
useful_text = BeautifulSoup(text_script, "lxml") 

# now print the useful_text using prettify module.
# print(useful_text.prettify())

In [24]:
# printing the pages's title
print(useful_text.title.text)

Bitcoin price today, BTC to USD live, marketcap and chart | CoinMarketCap


In [26]:
# after inspecting the page I obtained the market table's html
# calling bs4 to find table by specifing the class
table = useful_text.find('table', {'class':'h7vnx2-2 ecUULi cmc-table  '})

# bs4 is unable to scrape the market table hence 'none' result is returned
display(table)

None

### 2. Using only selenium library.

---
The python code was used appropriately, selenium could scrape market table but empty results were returned for some columns, resulting in too many empty values. This could be caused by any reasons unknown to me. I tried troubleshooting but;
- Selenium is not well documented in python, making it hard to troubleshoot
- Support for selenium using python support is not as wide as selenium using java, so its not at its full capacity on python

Now, I will show some codes

In [27]:
# Load selenium components
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait, Select

# Establish chrome driver and go to bitcoin market URL
url = 'https://coinmarketcap.com/currencies/bitcoin/markets/'
driver = webdriver.Chrome()
driver.get(url)

# scraping the table body by xpath then scrappping all table rows by xpath
rows = driver.find_element_by_xpath('//*[@id="__next"]/div/div[1]/div[2]/div/div[3]/div/div[2]/div/table/tbody').find_elements_by_tag_name('tr')

In [29]:
# scraping the page title
driver.find_element_by_xpath('//*[@id="__next"]/div/div[1]/div[2]/div/div[3]/div/div[1]/div[1]/h2').text

'Bitcoin Markets'

### Scraping 'sources' column of market table works quite fine

In [31]:
source=[]

# scraping the first 100 rows
for x in range(100):
    #get name
    source.append(rows[x].find_elements_by_tag_name('td')[1].text)
    
# printing the total number of rows with values
print(len(source))

# printing the data scraped
print(source)

100
['Binance', 'Huobi Global', 'Coinbase Exchange', 'Binance', 'FTX', 'KuCoin', 'Binance', 'Bitfinex', 'Coincheck', 'Gate.io', 'Binance', 'bitFlyer', 'FTX', 'Kraken', 'Kraken', 'Binance', 'Bitstamp', 'Binance', 'Bithumb', 'FTX US', 'Liquid', 'Binance', 'Gemini', 'Binance', 'Coinbase Exchange', 'FTX US', 'Binance', 'Binance', 'Binance.US', 'Binance', 'Binance', 'Bitfinex', 'Coinbase Exchange', 'Bitstamp', 'Coinbase Exchange', 'Huobi Global', 'Binance', 'Binance', 'Binance', 'Poloniex', 'Binance', 'Coinbase Exchange', 'Coinbase Exchange', 'Binance', 'Binance', 'Huobi Global', 'Bitfinex', 'Binance', 'Gate.io', 'Binance', 'Binance', 'Binance', 'Binance', 'Coinbase Exchange', 'Bitfinex', 'FTX', 'Kraken', 'Binance', 'Binance', 'Binance', 'Binance', 'Bitfinex', 'KuCoin', 'bitFlyer', 'KuCoin', 'FTX', 'Binance', 'Binance', 'Binance', 'Binance', 'Binance', 'Binance', 'Binance', 'Binance', 'Coinbase Exchange', 'Binance.US', 'Coinbase Exchange', 'Binance', 'Kraken', 'Binance', 'Bitfinex', 'Binanc

### However, whilst scraping 'pairs' column of the table, a bad pattern of too many empty results are noticed here - just 16 out of 100 returned values

In [10]:
pairs=[]

# scraping the first 100 rows
for x in range(100):

    #get pairs
    pairs.append(rows[x].find_elements_by_tag_name('td')[2].text)

# printing the total number of rows with values
len(pairs)

# printing the data scraped
print(pairs)


['BTC/USDT', 'BTC/USD', 'BTC/USD', 'BTC/USDT', 'BTC/BUSD', 'BTC/USDT', 'ETH/BTC', 'BTC/USDT', 'BTC/USD', 'BTC/USDT', 'BTC/JPY', 'XBT/USD', 'FTM/BTC', 'XBT/EUR', 'BTC/JPY', 'BTC/EUR', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']


100

### The same pattern of too many empty results are noticed in 'price' column - just  16 out of 100 have values

In [30]:
price=[]

for row in rows[0:]:
    try:
        
        price.append(row.find_elements_by_tag_name('td')[3].text)
    except AttributeError:
        break
    except IndexError:
        break
# printing the total number of rows with values              
print(len(price))
print(price)

16
['$54,587.58', '$54,601.31', '$54,599.35', '$54,589.55', '$54,602.00', '$54,636.28', '$54,636.31', '$54,632.00', '$54,670.95', '$54,637.74', '$54,636.31', '$54,620.43', '$54,618.58', '$54,611.70', '$54,639.97', '$54,209.70']


### 3. Using the combination of selenium and beautifulsoup library.
Limited data was still obtained with this method.

---

# Solutions

### Section 1 : Download data

#### Webscraping is not a sure-fire method, APIs are more reliable. For a website with busy traffic like coinmarketcap that embeds other websites data and updates them in real-time, any of their underlying javascript could be interferring with the webscraping process. As a result, the data obtained from the website would be limited. However, coinmarket API paid subscriptions(professional and enterprise) could be more efficient in getting more data with ease.

#### Please keep in mind, for the sake of seeing this project to completion combination of selenium and beautifulsoup libraries will be used, the downside to this method is limited data scraping.
---

### Section 2: Visualize

Python's library pandas was utilized for the required tasks

- There was no limitations found in this aspect

### Section 3: Plot
Python's library malplotlib was utilized for the required tasks

- There was no limitations found in this aspec

# Next, move to Section 1