# URLs

Use this notebook to count & tabulate the URLs and domains extracted from the content of the study carrel. The output will help you learn what remote content the authors of the study carrel content deemed important.

Consider using the output of this notebook as the input to an Internet spider (like `wget`). And then use the resulting content as input for another study carrel. Such is like downloading all the citations in a paper and analyzing them.

In [None]:
# configure
ETC      = 'etc'
DATABASE = 'reader.db'
URLS     = '''SELECT DISTINCT( url ) AS url, COUNT( DISTINCT( url ) ) As count
              FROM url
              GROUP BY url
              ORDER BY count DESC;'''
DOMAINS  = '''SELECT LOWER( DISTINCT( domain ) ) AS domain, COUNT( LOWER( DISTINCT( domain ) ) ) AS count
              FROM url
              GROUP BY domain
              ORDER BY count DESC, domain;'''

In [None]:
# require
from pathlib import Path
import sqlite3
import sys

In [None]:
# initialize
carrel                 = Path().absolute().parent
database               = carrel/ETC/DATABASE
connection             = sqlite3.connect( database  )
connection.row_factory = sqlite3.Row

In [None]:
# find all URLs and count the number of results
rows  = connection.execute( URLS )
rows  = rows.fetchall()
total = len( rows )

In [None]:
# process each item, conditionally
if total > 0 :
    
    # process each result
    for row in rows :
    
        # output
        print( "\t".join( [ row[ 'url' ], str( row[ 'count' ] ) ] ) )

# output
else : print( "This carrel contains zero URLs." )

In [None]:
# find all domains and count the number of results
rows  = connection.execute( DOMAINS )
rows  = rows.fetchall()
total = len( rows )

In [None]:
# process each item, conditionally
if total > 0 :
    
    # process each result
    for row in rows :
    
        # output
        print( "\t".join( [ row[ 'domain' ], str( row[ 'count' ] ) ] ) )

# output and done
else : print( "This carrel contains zero domains." )

In [None]:
# clean up and done
connection.close()
exit()