### Scrapping video title, description and weblink from the webpage using BeautifulSoup, requests and lxml packages.

In [1]:
from bs4 import BeautifulSoup
import requests
import csv

In [2]:
#Getting the response object and converting it into source code of the webpage using text attribute

In [3]:
source = requests.get('http://coreyms.com').text

In [4]:
# Creating a BeautifulSoup object of the source code and using lxml as a parser.

In [5]:
soup = BeautifulSoup(source, 'lxml')

In [6]:
# Checking the html structure of the first video of the webpage.

In [7]:
article = soup.find('article')

print(article.prettify())

#Each article tag in the source code consists of details of a particular video of the webpage.

<article class="post-1670 post type-post status-publish format-standard has-post-thumbnail category-development category-python tag-gzip tag-shutil tag-zip tag-zipfile entry" itemscope="" itemtype="https://schema.org/CreativeWork">
 <header class="entry-header">
  <h2 class="entry-title" itemprop="headline">
   <a class="entry-title-link" href="https://coreyms.com/development/python/python-tutorial-zip-files-creating-and-extracting-zip-archives" rel="bookmark">
    Python Tutorial: Zip Files – Creating and Extracting Zip Archives
   </a>
  </h2>
  <p class="entry-meta">
   <time class="entry-time" datetime="2019-11-19T13:02:37-05:00" itemprop="datePublished">
    November 19, 2019
   </time>
   by
   <span class="entry-author" itemprop="author" itemscope="" itemtype="https://schema.org/Person">
    <a class="entry-author-link" href="https://coreyms.com/author/coreymschafer" itemprop="url" rel="author">
     <span class="entry-author-name" itemprop="name">
      Corey Schafer
     </spa

In [8]:
#After inspecting the article tag of the first video, it has been concluded that 'a' tag within 'h2' tag consists
#of heading of the video.

In [9]:
# Parsing the heading of the first video

In [10]:
headline = article.h2.a.text

In [11]:
headline

'Python Tutorial: Zip Files – Creating and Extracting Zip Archives'

In [12]:
# Parsing the summary of the video present under the 'div' tag with attribute class = 'entry-content'.

In [13]:
summary = article.find('div', class_="entry-content").p.text
print(summary)

In this video, we will be learning how to create and extract zip archives. We will start by using the zipfile module, and then we will see how to do this using the shutil module. We will learn how to do this with single files and directories, as well as learning how to use gzip as well. Let’s get started…


In [14]:
# the src attribute of iframe tag has the embeded vido link. However, we could parse the youtube video id.
#Which is z0gguhEmWiY for the video in the 1st article tag. See below.

In [15]:
vid_src = article.find('iframe', class_="youtube-player")['src']
print(vid_src)

https://www.youtube.com/embed/z0gguhEmWiY?version=3&rel=1&showsearch=0&showinfo=1&iv_load_policy=1&fs=1&hl=en-US&autohide=2&wmode=transparent


In [16]:
# Splitting the string with '/' and extracting the element from the list that contains video id.

In [17]:
vid_id = vid_src.split('/')[4]
print(vid_id)

z0gguhEmWiY?version=3&rel=1&showsearch=0&showinfo=1&iv_load_policy=1&fs=1&hl=en-US&autohide=2&wmode=transparent


In [18]:
# Splitting the video element string with '?' and extracting the 1st element which is a video id.

In [19]:
vid_id = vid_id.split('?')[0]

In [20]:
print(vid_id)

z0gguhEmWiY


In [21]:
#Bellow is the format of video links of the youtube

In [22]:
yt_link = f'https://youtube.com/watch?v={vid_id}'

In [23]:
print(yt_link)

https://youtube.com/watch?v=z0gguhEmWiY


### Parsing video title, summary and video link for all the videos of the page and saving to a csv file for further use.

In [25]:
# Creating a csv file object in the write mode.

csv_file = open('cms_scrape.csv', 'w', newline='')

# Creating write object for the file object thus created.

csv_writer = csv.writer(csv_file)

# Writing headings (column names) of the file.

csv_writer.writerow(['headline', 'summary', 'video_link'])


# Looping through each article tag of the source code to extract title, summary and video links.

for article in soup.find_all('article'):
    
    # Parsing title/heading of the video.
    headline = article.h2.a.text
    print(headline)
    
    # Parsing summary/description of the video.
    summary = article.find('div', class_="entry-content").p.text
    print(summary)
    
    # using try and except block in case any video doesn't have a video link which will potentially break the code.
    try:  
        # Parsing the src attribute of the iframe tag
        vid_src = article.find('iframe', class_="youtube-player")['src']
        
        
        # Splitting the src string with '/' to parse element of the list that contains video id.
        vid_id = vid_src.split('/')[4]
        
        
        # Further splitting the video id element string and extracting the video id.
        vid_id = vid_id.split('?')[0]

        # Creating a full youtube video link using the video id thus parsed.
        yt_link = f'https://youtube.com/watch?v={vid_id}'
    except Exception as e:
        yt_link = None
    
    print(yt_link)
    
    # writing the heading, summary and video link thus parsed into the csv file.
    csv_writer.writerow([headline, summary, yt_link])
    
    #Inserting blank print statement to insert a space between individual outputs.
    print()
    
# Closing the file object   
   
csv_file.close()


Python Tutorial: Zip Files – Creating and Extracting Zip Archives
In this video, we will be learning how to create and extract zip archives. We will start by using the zipfile module, and then we will see how to do this using the shutil module. We will learn how to do this with single files and directories, as well as learning how to use gzip as well. Let’s get started…
https://youtube.com/watch?v=z0gguhEmWiY

Python Data Science Tutorial: Analyzing the 2019 Stack Overflow Developer Survey
In this Python Programming video, we will be learning how to download and analyze real-world data from the 2019 Stack Overflow Developer Survey. This is terrific practice for anyone getting into the data science field. We will learn different ways to analyze this data and also some best practices. Let’s get started…
https://youtube.com/watch?v=_P7X8tMplsw

Python Multiprocessing Tutorial: Run Code in Parallel Using the Multiprocessing Module
In this Python Programming video, we will be learning how t