<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# GitHub - Get DataFrame from project view

**Tags:** #github #dataframe #beautifulsoup #projectview #scraping #python

**Author:** [Benjamin Filly](https://www.linkedin.com/in/benjamin-filly-05427727a/)

**Description:** This notebook will show how to return a dataframe from project view using BeautifulSoup. It is usefull for organizations to quickly get data from GitHub project view.

**References:**
- [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [GitHub Project View](https://help.github.com/en/github/managing-your-work-on-github/about-project-boards)

## Input

### Import libraries

In [7]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

### Setup Variables
- `url`: URL of the project view page

In [4]:
url = "https://github.com/orgs/jupyter-naas/projects/10/views/19"

## Model

In [10]:
# Get HTML from URL
response = requests.get(url)
html = response.text

# Parse HTML
soup = BeautifulSoup(html, "html.parser")

soup


<!DOCTYPE html>

<html class="height-full" data-a11y-animated-images="system" data-color-mode="auto" data-dark-theme="dark" data-light-theme="light" lang="en">
<head>
<meta charset="utf-8"/>
<link href="https://github.githubassets.com" rel="dns-prefetch"/>
<link href="https://avatars.githubusercontent.com" rel="dns-prefetch"/>
<link href="https://github-cloud.s3.amazonaws.com" rel="dns-prefetch"/>
<link href="https://user-images.githubusercontent.com/" rel="dns-prefetch"/>
<link crossorigin="" href="https://github.githubassets.com" rel="preconnect"/>
<link href="https://avatars.githubusercontent.com" rel="preconnect"/>
<link crossorigin="anonymous" href="https://github.githubassets.com/assets/light-8cafbcbd78f4.css" media="all" rel="stylesheet"><link crossorigin="anonymous" href="https://github.githubassets.com/assets/dark-31dc14e38457.css" media="all" rel="stylesheet"><link crossorigin="anonymous" data-color-theme="dark_dimmed" data-href="https://github.githubassets.com/assets/dark_d

### Get DataFrame from project view

This function will return a dataframe from project view using BeautifulSoup.

In [8]:
def get_dataframe_from_project_view(url):
    # Get HTML from URL
    response = requests.get(url)
    html = response.text

    # Parse HTML
    soup = BeautifulSoup(html, "html.parser")

    # Get all cards
    cards = soup.find_all("div", {"class": "js-project-column-card"})

    # Create dataframe
    df = pd.DataFrame(columns=["title", "description", "url"])

    # Iterate over cards
    for card in cards:
        # Get title
        title = card.find("h3", {"class": "h4 lh-condensed"}).text

        # Get description
        description = card.find("p", {"class": "text-gray mb-2"}).text

        # Get URL
        url = card.find("a", {"class": "h4 lh-condensed"})["href"]

        # Append to dataframe
        df = df.append(
            {"title": title, "description": description, "url": url}, ignore_index=True
        )

    return df

## Output

### Display result

In [9]:
df = get_dataframe_from_project_view(url)
df

Unnamed: 0,title,description,url
