# Coinmarketcap - mining

The goal of this little piece in the project is to get the information on the different coins listed on coinmarketcap. We are specifically after the rank, name, link to the coin data, and some information on the price. The latter is not required, but well if we start mining, why not take all of it?

First, we need to have a chromedriver installed, because we will use automated testing software to do the mining. Make sure that you have the corresponding driver installed according to your chrome installation. It will complain if that is not the case, fix is easy and a quick Google search will do the trick!

## Import libraries


In [1]:
import pandas as pd # Pandas for data manipulation
import numpy as np
import re          #for regex
import datetime  
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

## Mining settings

Since we use selenium, we need to specify the address of the driver

In [4]:
PATH_CHROMEDRIVER = '../chromedriver'

# which website we want to do the mining on
WEBSITE = 'https://www.coinmarketcap.com/all/views/all/'

# where to store the mined csv file
OUTPUT_FOLDER = '../01.Original_data/cmp_rank/'

## Mining

We have already everything to do the actual mining.

In [None]:
# set download path
chromeOptions = webdriver.ChromeOptions()

# initialize driver
driver = webdriver.Chrome(PATH_CHROMEDRIVER, options=chromeOptions)

# Open website
driver.get(WEBSITE)

# Sleep 5 seconds to open website
time.sleep(10)

# Click load more until there is no load more
load_more = True
while load_more:
    try:
        elems = driver.find_element_by_xpath('//button[text()="Load More"]')
        driver.execute_script("arguments[0].scrollIntoView();", elems)
        elems.click()
        time.sleep(10)
    except NoSuchElementException:
        print("No load more button any more.")
        load_more = False
    time.sleep(10)
    
    
# find all the information
tables = driver.find_elements_by_tag_name('table')
cointable = tables[2].find_elements_by_tag_name('td')

time_start = datetime.datetime.now()
l = []
sub_l = []
cnt = 0
for item in cointable:
    td = item.text
    if len(td) > 0:
        sub_l.append(td)
        els = cointable[cnt].find_elements_by_tag_name('a')
        for el in els:
            href = el.get_attribute("href")
            if href is not None:
                if 'markets' not in href:
                    sub_l.append(href)
    else:
        l.append(sub_l)
        sub_l = []
    if (cnt % 1000) == 0:
        d_time = (datetime.datetime.now() - time_start).seconds
        print(f"Done {cnt} out of {len(cointable)}. Duration {d_time} seconds.")
    cnt += 1
    
col = ['Rank','Name','Link','Symbol','Market_Cap','Price','Circulating_Supply','Volume(24h)','%1h','%24h','%7d']
df_allcrypto = pd.DataFrame.from_records(l, columns=col)

df_allcrypto.to_csv(f'{OUTPUT_FOLDER}/all_crypto.csv')

driver.quit()

print("Chromedriver closed.")

# Conclusion

That's it! We now have all the rank information in our output folder stored in all_crypto.csv. Note that the code can be easily adjusted to not get the info from all the 2700 coins but only the top few hundred or so. This greatly speeds up the process.