<a href="https://github.com/seontology/seontology"><img src="https://github.com/seontology/seontology/blob/main/.assets/seontology_logo.png?raw=true"></img></a>


# Representing a web page as a `WebPage` class

Initial workflow to create a representation of the `WebPage` class developed as part of **SEOntology** by [WordLift](https://wordlift.io/). This notebook covers Phase 1, as outlined below:

## Phase 1: Extracting standard elements that exist on almost any web page

* Example elements: title tag, meta description, images, links, etc.
* Crawl a set of URLs or a full website
* Extract the required elements from each page
* Create a JSON representation according to [SEOntology](https://github.com/seontology/seontology)

## Phase 2: Extracting certain elements that may exist on a web page in various formats
* Examples: Author information, publishing date, last update, etc.
* This phase requires custom crawling to extract this information.
* In some cases this can be provided in an easily-parseable format like JSON-LD, which can be automated.
* In some cases custom extraction using XPath/CSS selectors might be required.

## Phase 3: Adding website data, which can be automated but requires access to the data
* Sources: Google Search Console, Google Analytics, etc.
* Special scripts can be written to standardize the process of extracting this information from the relevant API

In [1]:
import advertools as adv
import pandas as pd
pd.options.display.max_columns = None
pd.set_option('future.no_silent_downcasting', True)

## Get website URLs from XML sitemap and crawl the website

In [2]:
# wl = adv.sitemap_to_df('https://wordlift.io/sitemap.xml')
# adv.crawl(wl['loc'].dropna(), 'wordlift_crawl.jl')
crawl_df = pd.read_json('wordlift_crawl.jl', lines=True)
crawl_df.head(3)

Unnamed: 0,url,title,meta_desc,viewport,charset,h1,h3,h4,canonical,alt_href,og:locale,og:type,og:title,og:description,og:url,og:site_name,og:image,og:image:width,og:image:height,og:image:type,twitter:card,twitter:label1,twitter:data1,jsonld_@context,jsonld_@graph,jsonld_1_@context,jsonld_1_@id,jsonld_1_@type,jsonld_1_description,jsonld_1_mainEntityOfPage,jsonld_1_image,jsonld_1_name,jsonld_1_url,jsonld_1_provider,jsonld_1_educationalLevel,jsonld_1_inLanguage,jsonld_1_about,jsonld_1_mentions,jsonld_1_offers.availability,jsonld_1_offers.url,jsonld_1_offers.@type,jsonld_1_sameAs,jsonld_1_alternateName,jsonld_1_legalName,jsonld_1_email,jsonld_1_telephone,jsonld_1_address.@type,jsonld_1_address.streetAddress,jsonld_1_address.postOfficeBoxNumber,jsonld_1_address.postalCode,jsonld_1_address.addressLocality,jsonld_1_address.addressRegion,jsonld_1_address.addressCountry,body_text,size,download_timeout,download_slot,download_latency,depth,status,links_url,links_text,links_nofollow,nav_links_url,nav_links_text,nav_links_nofollow,header_links_url,header_links_text,header_links_nofollow,footer_links_url,footer_links_text,footer_links_nofollow,img_src,img_width,img_height,img_alt,img_fetchpriority,img_decoding,img_srcset,img_sizes,ip_address,crawl_time,resp_headers_Date,resp_headers_Content-Type,resp_headers_Cf-Ray,resp_headers_Cf-Cache-Status,resp_headers_Cache-Control,resp_headers_Expires,resp_headers_Last-Modified,resp_headers_Link,resp_headers_Set-Cookie,resp_headers_Vary,resp_headers_Cf-Apo-Via,resp_headers_Cf-Edge-Cache,resp_headers_Ki-Cache-Type,resp_headers_Ki-Cf-Cache-Status,resp_headers_Ki-Edge,resp_headers_Ki-Edge-O2O,resp_headers_Ki-Origin,resp_headers_Nel,resp_headers_Report-To,resp_headers_X-Content-Type-Options,resp_headers_X-Edge-Location-Klb,resp_headers_X-Frame-Options,resp_headers_X-Kinsta-Cache,resp_headers_Server,request_headers_Accept,request_headers_Accept-Language,request_headers_User-Agent,request_headers_Accept-Encoding,h2,jsonld_1_thumbnailUrl,jsonld_1_uploadDate,jsonld_1_contentUrl,jsonld_1_duration,jsonld_1_embedUrl,jsonld_1_expires,jsonld_1_hasPart,jsonld_1_mentions.@id,jsonld_1_headline,jsonld_1_datePublished,jsonld_1_dateModified,jsonld_1_wordCount,jsonld_1_commentCount,jsonld_1_publisher.@type,jsonld_1_publisher.@id,jsonld_1_publisher.name,jsonld_1_publisher.sameAs,jsonld_1_publisher.logo.@type,jsonld_1_publisher.logo.url,jsonld_1_publisher.logo.width,jsonld_1_publisher.logo.height,jsonld_1_author.@type,jsonld_1_author.@id,jsonld_1_author.name,jsonld_1_author.givenName,jsonld_1_author.familyName,jsonld_1_author.url,jsonld_1_publication.endDate,jsonld_1_publication.startDate,jsonld_1_publication.name,jsonld_1_publication.@type,redirect_times,redirect_ttl,redirect_urls,redirect_reasons,request_headers_Cookie,jsonld_1_itemListElement,jsonld_1_coursePrerequisites,jsonld_1_competencyRequired,twitter:title,twitter:description,twitter:image,jsonld_1_mainEntity,jsonld_1_publication.isLiveBroadcast,jsonld_1_teaches,jsonld_1_affiliation.@id,alt_hreflang,twitter:label2,twitter:data2,jsonld_1_articleSection,jsonld_1_video,jsonld_1_author,jsonld_1_about.@id,jsonld_1_knows,jsonld_1_birthDate,jsonld_1_birthPlace.@id,jsonld_1_interactionStatistic.@type,jsonld_1_interactionStatistic.interactionType.@type,jsonld_1_interactionStatistic.userInteractionCount,jsonld_1_potentialAction.@type,jsonld_1_potentialAction.target,jsonld_1_potentialAction.query-input,resp_headers_Age,h6,h5,jsonld_1_isRelatedTo,jsonld_1_isSimilarTo,jsonld_1_areaServed,jsonld_1_provider.@id,jsonld_1_title,jsonld_1_datePosted,jsonld_1_hiringOrganization,jsonld_1_jobLocation,jsonld_1_employmentType,jsonld_1_validThrough,jsonld_1_experienceRequirements,jsonld_1_responsibilities,jsonld_1_employerOverview,jsonld_1_industry,jsonld_1_baseSalary.currency,jsonld_1_baseSalary.@type,jsonld_1_baseSalary.value.@type,jsonld_1_baseSalary.value.unitText,jsonld_1_educationRequirements,jsonld_1_audience,jsonld_1_category,jsonld_1_serviceType,jsonld_1_serviceOutput
0,https://wordlift.io/academy-entries/knowledge-...,Knowledge Graph and Panels | Webinar With Andr...,Join the webinar to learn all the relevant tip...,"width=device-width, initial-scale=1.0, maximum...",UTF-8,Knowledge Graph and Panels | Webinar With Andr...,Are you ready for the next SEO? Try WordLift t...,Company@@Plans@@Learn@@Helpful Links,https://wordlift.io/academy-entries/knowledge-...,https://wordlift.io/feed/@@https://wordlift.io...,en_US,article,Knowledge Graph and Panels | Webinar With Andr...,Join the webinar to learn all the relevant tip...,https://wordlift.io/academy-entries/knowledge-...,AI-Powered SEO • WordLift,https://wordlift.io/wp-content/uploads/2020/04...,961.0,540.0,image/png,summary_large_image,Est. reading time,2 minutes,https://schema.org,"[{'@type': 'WebPage', '@id': 'https://wordlift...",http://schema.org,http://data.wordlift.io/wl01893/academy-entrie...,Course,"[In the world of zero-click searches, the pote...",https://wordlift.io/academy-entries/knowledge-...,"[{'@type': 'ImageObject', 'url': 'https://eacn...",[Knowledge Graph and Panels | Webinar With And...,https://wordlift.io/academy-entries/knowledge-...,[{'@id': 'http://data.wordlift.io/wl01893/enti...,intermediate,En,[{'@id': 'http://data.wordlift.io/wl01893/enti...,[{'@id': 'http://data.wordlift.io/wl01893/enti...,InStock,https://wordlift.io/academy-entries/knowledge-...,Offer,,,,,,,,,,,,,Solutions \n \n By Market \n \n SEO Management...,101567,180,wordlift.io,1.197423,0,200,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/academy-entries/knowledge-...,Solutions@@By Market@@SEO Management ServiceSt...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/choose-your-plan/@@https:/...,Try it for free@@Book A Demo@@Get a quote@@Boo...,False@@False@@False@@False@@False@@False@@Fals...,https://eacn2n47zot.exactdn.com/wp-content/upl...,4010@@1080@@847@@,1241@@675@@303@@,AI-Powered SEO • WordLift@@Knowledge Graph and...,@@@@high@@,@@@@async@@,@@@@https://eacn2n47zot.exactdn.com/wp-content...,@@@@(min-width: 0px) and (max-width: 480px) 48...,104.18.8.209,2024-03-15 19:04:59,"Fri, 15 Mar 2024 19:04:59 GMT",text/html; charset=UTF-8,864ec5b4391132a8-AMM,MISS,"public, max-age=14400","Fri, 15 Mar 2024 23:04:59 GMT","Fri, 15 Mar 2024 19:04:59 GMT","<https://wordlift.io/wp-json/>; rel=""https://a...","sq51mybo=m76hppjgn86b; expires=Wed, 20-Mar-202...",Accept-Encoding,"origin,miss","cache,platform=wordpress",,BYPASS,v=20.2.7;mv=3.0.4,yes,g1p,"{""success_fraction"":0.01,""report_to"":""cf-nel"",...","{""endpoints"":[{""url"":""https:\/\/a.nel.cloudfla...",nosniff,1,SAMEORIGIN,MISS,cloudflare,"text/html,application/xhtml+xml,application/xm...",en,advertools/0.14.2,"gzip, deflate, br",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,https://wordlift.io/academy-entries/mobile-fir...,Mobile First Indexing | Webinar with Cindy Krum,"If you are working in SEO, you really can’t ig...","width=device-width, initial-scale=1.0, maximum...",UTF-8,Mobile First Indexing | Webinar with Cindy Krum,Are you ready for the next SEO? Try WordLift t...,Company@@Plans@@Learn@@Helpful Links,https://wordlift.io/webinar-mobile-first-index...,https://wordlift.io/feed/@@https://wordlift.io...,en_US,article,Mobile First Indexing | Webinar with Cindy Krum,"If you are working in SEO, you really can’t ig...",https://wordlift.io/webinar-mobile-first-index...,AI-Powered SEO • WordLift,https://wordlift.io/wp-content/uploads/2020/03...,1280.0,720.0,image/png,summary_large_image,Est. reading time,2 minutes,https://schema.org,"[{'@type': 'WebPage', '@id': 'https://wordlift...",http://schema.org,http://data.wordlift.io/wl01893/academy-entrie...,VideoObject,"[If you are working in SEO, you really can’t i...",https://wordlift.io/academy-entries/mobile-fir...,"[{'@type': 'ImageObject', 'url': 'https://eacn...",[Mobile First Indexing | Webinar with Cindy Kr...,https://wordlift.io/academy-entries/mobile-fir...,,,,[{'@id': 'http://data.wordlift.io/wl01893/enti...,[{'@id': 'http://data.wordlift.io/wl01893/enti...,,,,,,,,,,,,,,,,Solutions \n \n By Market \n \n SEO Management...,101140,180,wordlift.io,1.762641,0,200,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/academy-entries/mobile-fir...,Solutions@@By Market@@SEO Management ServiceSt...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/choose-your-plan/@@https:/...,Try it for free@@Book A Demo@@Get a quote@@Boo...,False@@False@@False@@False@@False@@False@@Fals...,https://eacn2n47zot.exactdn.com/wp-content/upl...,4010@@1080@@300@@,1241@@675@@300@@,AI-Powered SEO • WordLift@@Mobile First Indexi...,@@@@high@@,@@@@async@@,@@@@https://eacn2n47zot.exactdn.com/wp-content...,"@@@@(max-width: 300px) 100vw, 300px@@",104.18.8.209,2024-03-15 19:04:59,"Fri, 15 Mar 2024 19:04:59 GMT",text/html; charset=UTF-8,864ec5b43c6632ad-AMM,MISS,"public, max-age=14400","Fri, 15 Mar 2024 23:04:59 GMT","Fri, 15 Mar 2024 19:04:59 GMT","<https://wordlift.io/wp-json/>; rel=""https://a...","sq51mybo=m76hppjgn86b; expires=Wed, 20-Mar-202...",Accept-Encoding,"origin,miss","cache,platform=wordpress",,BYPASS,v=20.2.7;mv=3.0.4,yes,g1p,"{""success_fraction"":0.01,""report_to"":""cf-nel"",...","{""endpoints"":[{""url"":""https:\/\/a.nel.cloudfla...",nosniff,1,SAMEORIGIN,EXPIRED,cloudflare,"text/html,application/xhtml+xml,application/xm...",en,advertools/0.14.2,"gzip, deflate, br",Why You Can’t Ignore Mobile First Indexing Any...,https://wordlift.io/wp-content/uploads/2020/03...,2019/05/21,https://cdn.jwplayer.com/manifests/Zx2gEWsn.m3u8,PT57M45S,https://cdn.jwplayer.com/manifests/Zx2gEWsn.m3u8,2021/12/31,"[{'endOffset': '742', 'startOffset': '397', 'u...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,https://wordlift.io/academy-entries/dati-strut...,Come sfruttare i dati strutturati per aumentar...,Scopri come aumentare esposizione e visibilità...,"width=device-width, initial-scale=1.0, maximum...",UTF-8,Come sfruttare i dati strutturati per aumentar...,Are you ready for the next SEO? Try WordLift t...,Company@@Plans@@Learn@@Helpful Links,https://wordlift.io/it/academy-entries/dati-st...,https://wordlift.io/feed/@@https://wordlift.io...,en_US,article,Come sfruttare i dati strutturati per aumentar...,Scopri come aumentare esposizione e visibilità...,https://wordlift.io/it/academy-entries/dati-st...,AI-Powered SEO • WordLift,,,,,summary_large_image,Est. reading time,1 minute,https://schema.org,"[{'@type': 'WebPage', '@id': 'https://wordlift...",http://schema.org,http://data.wordlift.io/wl01893/academy-entrie...,Course,[Scopri con Marco Maltraversi come aumentare e...,https://wordlift.io/academy-entries/dati-strut...,,[Come sfruttare i dati strutturati per aumenta...,https://wordlift.io/academy-entries/dati-strut...,[{'@id': 'http://data.wordlift.io/wl01893/enti...,intermediate,It,,[{'@id': 'http://data.wordlift.io/wl01893/enti...,InStock,https://wordlift.io/academy-entries/dati-strut...,Offer,,,,,,,,,,,,,Solutions \n \n By Market \n \n SEO Management...,93751,180,wordlift.io,1.360074,0,200,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/academy-entries/dati-strut...,Solutions@@By Market@@SEO Management ServiceSt...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/@@https://wordlift.io/acad...,\n\n@@Solutions@@By Market@@SEO Management Ser...,False@@False@@False@@False@@False@@False@@Fals...,https://wordlift.io/choose-your-plan/@@https:/...,Try it for free@@Book A Demo@@Get a quote@@Boo...,False@@False@@False@@False@@False@@False@@Fals...,https://eacn2n47zot.exactdn.com/wp-content/upl...,4010@@,1241@@,AI-Powered SEO • WordLift@@,,,,,104.18.8.209,2024-03-15 19:04:59,"Fri, 15 Mar 2024 19:04:59 GMT",text/html; charset=UTF-8,864ec5b45eef32ab-AMM,MISS,"public, max-age=14400","Fri, 15 Mar 2024 23:04:59 GMT","Fri, 15 Mar 2024 19:04:59 GMT","<https://wordlift.io/wp-json/>; rel=""https://a...","sq51mybo=m76hppjgn86b; expires=Wed, 20-Mar-202...",Accept-Encoding,"origin,miss","cache,platform=wordpress",,BYPASS,v=20.2.7;mv=3.0.4,yes,g1p,"{""success_fraction"":0.01,""report_to"":""cf-nel"",...","{""endpoints"":[{""url"":""https:\/\/a.nel.cloudfla...",nosniff,1,SAMEORIGIN,MISS,cloudflare,"text/html,application/xhtml+xml,application/xm...",en,advertools/0.14.2,"gzip, deflate, br",Perché dovrebbe interessarti?@@Cosa imparerai?...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


## Extracting attributes of `WebPage`

* Rename some columns to comply with the spec, e.g. `title` ==> `metaTitleContent`
* Run some checks and create new columns showing the result of the checks, e.g. `containsmage` checks if the `img_src` is not empty.

The following elements are extracted:

* `containsImage`
* `containsVideo`
* `hasMetaDescription`
* `hasMetaTitle`
* `usesSchema`
* `Internal_Links`
    * `anchorTextContent`
    * `NoFollow`
    * `Link`


In [3]:
crawl_df = crawl_df.rename(columns={
    'title': 'metaTitleContent',
    'meta_desc': 'metaDescriptionContent'
})
crawl_df['hasMetaTitle'] = crawl_df['metaTitleContent'].notna()
crawl_df['hasMetaDescription'] = crawl_df['metaDescriptionContent'].notna()
crawl_df['containsImage'] = crawl_df['img_src'].notna()
crawl_df['usesSchema'] = crawl_df.filter(regex='jsonld').notna().sum(axis=1).gt(0)

## Summarize internal links on the website

In [4]:
link_df = adv.crawlytics.links(crawl_df, internal_url_regex=r'^https://wordlift\.io')
link_df.groupby('url').head(3).head(9)

Unnamed: 0,url,link,text,nofollow,internal
0,https://wordlift.io/academy-entries/knowledge-...,https://wordlift.io/,\n\n,False,True
0,https://wordlift.io/academy-entries/knowledge-...,https://wordlift.io/academy-entries/knowledge-...,Solutions,False,True
0,https://wordlift.io/academy-entries/knowledge-...,https://wordlift.io/academy-entries/knowledge-...,By Market,False,True
1,https://wordlift.io/academy-entries/mobile-fir...,https://wordlift.io/,\n\n,False,True
1,https://wordlift.io/academy-entries/mobile-fir...,https://wordlift.io/academy-entries/mobile-fir...,Solutions,False,True
1,https://wordlift.io/academy-entries/mobile-fir...,https://wordlift.io/academy-entries/mobile-fir...,By Market,False,True
2,https://wordlift.io/academy-entries/dati-strut...,https://wordlift.io/,\n\n,False,True
2,https://wordlift.io/academy-entries/dati-strut...,https://wordlift.io/academy-entries/dati-strut...,Solutions,False,True
2,https://wordlift.io/academy-entries/dati-strut...,https://wordlift.io/academy-entries/dati-strut...,By Market,False,True


In [5]:
for url in crawl_df['url']:
    internal_links = link_df[link_df['internal'] & link_df['link'].eq(url)]
    Internal_Links_url = internal_links['url'].str.cat(sep='@@')
    Internal_Links_text = internal_links['text'].str.cat(sep='@@')
    Internal_Links_nofollow = internal_links['nofollow'].astype(str).str.cat(sep='@@')
    crawl_df['Internal_Links_url'] = Internal_Links_url
    crawl_df['Internal_Links_text'] = Internal_Links_text
    crawl_df['Internal_Links_nofollow'] = Internal_Links_nofollow

## Sample of converted columns and extracted data

In [6]:
crawl_df.filter(regex='^has|^contains|^url$|meta|^uses|Internal').head()

Unnamed: 0,url,metaTitleContent,metaDescriptionContent,hasMetaTitle,hasMetaDescription,containsImage,usesSchema,Internal_Links_url,Internal_Links_text,Internal_Links_nofollow
0,https://wordlift.io/academy-entries/knowledge-...,Knowledge Graph and Panels | Webinar With Andr...,Join the webinar to learn all the relevant tip...,True,True,True,True,https://wordlift.io/case-studies/news-and-medi...,Solutions@@By Market@@By Plan@@Product@@Our Pr...,False@@False@@False@@False@@False@@False@@Fals...
1,https://wordlift.io/academy-entries/mobile-fir...,Mobile First Indexing | Webinar with Cindy Krum,"If you are working in SEO, you really can’t ig...",True,True,True,True,https://wordlift.io/case-studies/news-and-medi...,Solutions@@By Market@@By Plan@@Product@@Our Pr...,False@@False@@False@@False@@False@@False@@Fals...
2,https://wordlift.io/academy-entries/dati-strut...,Come sfruttare i dati strutturati per aumentar...,Scopri come aumentare esposizione e visibilità...,True,True,True,True,https://wordlift.io/case-studies/news-and-medi...,Solutions@@By Market@@By Plan@@Product@@Our Pr...,False@@False@@False@@False@@False@@False@@Fals...
3,https://wordlift.io/academy-entries/open-sourc...,Open Source Knowledge Graph: Build Your Own be...,Take a step forward and start creating your ow...,True,True,True,True,https://wordlift.io/case-studies/news-and-medi...,Solutions@@By Market@@By Plan@@Product@@Our Pr...,False@@False@@False@@False@@False@@False@@Fals...
4,https://wordlift.io/academy-entries/voice-sear...,Is Voice Here to Stay? It is now 2020 — Live W...,Learn from this webinar tips about voice searc...,True,True,True,True,https://wordlift.io/case-studies/news-and-medi...,Solutions@@By Market@@By Plan@@Product@@Our Pr...,False@@False@@False@@False@@False@@False@@Fals...


In [7]:
import random
crawl_df['gsc_7DaysClicks'] = [random.randint(10, 200) for i in range(len(crawl_df))]
crawl_df['gsc_7DaysImpressions'] = [random.randint(1000, 10000) for i in range(len(crawl_df))]

## Convert to JSON

Example of [https://wordlift.io/academy-entries/knowledge-graph-and-panels-webinar/](https://wordlift.io/academy-entries/knowledge-graph-and-panels-webinar/)

In [8]:
from pprint import pprint
# pprint(crawl_df.filter(regex='^has|^contains|^url$|meta|Internal').head(1).to_dict(orient='records')[0], indent=4)

```json

{   'Internal_Links_nofollow': 'False@@False@@False@@False@@False@@False@@False@@False',
    'Internal_Links_text': 'Solutions@@By Market@@By Plan@@Product@@Our '
                           'Products@@Smart ContentImprove your search '
                           'visibility with AI-Driven '
                           'Solutions.@@Resources@@Learn',
    'Internal_Links_url': 'https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/',
    'containsImage': True,
    'hasMetaDescription': True,
    'hasMetaTitle': True,
    'metaDescriptionContent': 'Join the webinar to learn all the relevant tips '
                              'to improve your branding on Google’s Knowledge '
                              'Graph panels.',
    'metaTitleContent': 'Knowledge Graph and Panels | Webinar With Andrea '
                        'Volpini, Jason Barnard and Dixon Jones - AI-Powered '
                        'SEO • WordLift',
    'url': 'https://wordlift.io/academy-entries/knowledge-graph-and-panels-webinar/'}
```

Example of [https://wordlift.io/academy-entries/mobile-first-indexing](https://wordlift.io/academy-entries/mobile-first-indexing)

In [9]:
# pprint(crawl_df.filter(regex='^has|^contains|^url$|meta|Internal').head(2).to_dict(orient='records')[1], indent=4)

```json
{   'Internal_Links_nofollow': 'False@@False@@False@@False@@False@@False@@False@@False',
    'Internal_Links_text': 'Solutions@@By Market@@By Plan@@Product@@Our '
                           'Products@@Smart ContentImprove your search '
                           'visibility with AI-Driven '
                           'Solutions.@@Resources@@Learn',
    'Internal_Links_url': 'https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/@@https://wordlift.io/case-studies/news-and-media/',
    'containsImage': True,
    'hasMetaDescription': True,
    'hasMetaTitle': True,
    'metaDescriptionContent': 'If you are working in SEO, you really can’t '
                              'ignore how the Mobile First Indexing is '
                              'rearranging Google’s index. Watch Cindy Krum’s '
                              'webinar!',
    'metaTitleContent': 'Mobile First Indexing | Webinar with Cindy Krum',
    'url': 'https://wordlift.io/academy-entries/mobile-first-indexing/'}
```

## Extracted elements' usage

In [10]:
(crawl_df
 .filter(regex='^has|^contains|^url$|^uses')
 .notna()
 .mean()
 .to_frame()[1:]
 .rename(columns={0: '% usage'})
 .style
 .format('{:.1%}'))

Unnamed: 0,% usage
hasMetaTitle,100.0%
hasMetaDescription,100.0%
containsImage,100.0%
usesSchema,100.0%
