# My Datacamp Completed Courses List

By **Daniel Palacio** (github.com/palaciodaniel) - August 2020.

### 0. Introduction

Before starting, I want to mention that in my personal case (Linux Mint) there was a problem installing 
the *scrapy* library. I was getting the error *'command 'gcc' failed with exit status 1'*. 

It was fixed when I installed the library 'libxml2-dev' on my OS. For reference, I found the solution on the following [StackOverflow page](https://stackoverflow.com/questions/10927492/getting-gcc-failed-error-while-installing-scrapy).

In [None]:
# As mentioned, the 'scrapy' module is required to execute this program.
# If you don't have it installed, erase the '#' sign on the following line...

# !pip install scrapy

### 1. Creating (and executing) a Spider for Web Scraping

Once this is finished, a file "lista_cursos_completados.py" will be created, with all the Datacamp courses I finished so far.

In [None]:
import scrapy
from scrapy.crawler import CrawlerProcess

# Extracting courses' titles from my Datacamp's profile

class DatacampProfileSpider(scrapy.Spider):
    
    name = "daniel_palacio"
    
    def start_requests(self):
        
        url = "https://www.datacamp.com/profile/danielpalacio"
        
        yield scrapy.Request(url = url, callback = self.extraction)
    
    def extraction(self, response):
        
        course_list = response.xpath("//*[contains(@class, 'course-block__title')]//text()").extract()
        
        pyfile = "lista_cursos_completados.py"
        
        with open(pyfile, "w") as f:
            f.writelines([course + "," for course in course_list])

# Initializing Spider.            

process = CrawlerProcess()

process.crawl(DatacampProfileSpider)

process.start()

### 2. Cleaning the downloaded list

The information on 'lista_cursos_completados.py' is disorganized. It needs to be cleaned. 

However, first we need to force the Notebook to show the results without scrollable frames.

In [1]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [2]:
from pathlib import Path

# 1. Adding the downloaded list of courses to a variable.

complete_course_list = Path("lista_cursos_completados.py").read_text()

# 2. It's required that we transform the list to a "str" type variable.

complete_course_list = "".join(complete_course_list)

# 3. The 'split' method re-creates a list using commas to define every new list element.

complete_course_list = complete_course_list.split(",")

# 4. There are two problematic entries that need to be addressed:

#   a) The first element of the list is empty, so we'll erase it.

complete_course_list = complete_course_list[1:]

#   b) There is a lowercase element ("pandas Foundations"). 
#      We need to make it uppercase, otherwise when we sort the list it will
#      be put at the very end.

for index, course in enumerate(complete_course_list):
    if course == "pandas Foundations":
        complete_course_list[index] = "Pandas Foundations"

# 5. Having cleaned everything, then finally we can sort the list alphabetically.

sorted_course_list = sorted(complete_course_list)

# 6. Printing ordered list

print(*sorted_course_list, sep = "\n")



AI Fundamentals
Advanced Deep Learning with Keras
Analyzing Police Activity with pandas
Big Data Fundamentals with PySpark
Building and Distributing Packages with Conda
Building and Optimizing Triggers in SQL Server
Case Study: School Budgeting with Machine Learn...
Cleaning Data in Python
Cleaning Data in Python
Cloud Computing for Everyone
Cluster Analysis in Python
Command Line Automation in Python
Conda Essentials
Creating Robust Workflows in Python
Data Analysis in Spreadsheets
Data Engineering for Everyone
Data Manipulation with pandas
Data Processing in Shell
Data Science for Business
Data Science for Everyone
Data Types for Data Science in Python
Data Visualization for Everyone
Database Design
Dealing with Missing Data in Python
Dimensionality Reduction in Python
Exploratory Data Analysis in Python
Exploratory Data Analysis in SQL
Extreme Gradient Boosting with XGBoost
Feature Engineering for Machine Learning in Python
Feature Engineering for NLP in Python
Functions for Manipu