# Libraries

Python comes with a number of built-in functions, but you can access more functions by importing **libraries**. 

Libraries are collections of functions and other code that someone has created around a particular problem. It is one of the great features of modern programming: anyone can create and share their own code for others to use and build upon, as new problems come along. 

There are libraries for scraping, for dealing with particular data types such as JSON and XML, for data visualisation, for statistical analysis, for working with dates and times, for producing HTML and JavaScript outputs, and many other situations.

You can import and install a library in a Jupyter notebook by using `!conda install` or `!pip install` followed by the name of the library. Some more specific [good practice is outlined in this post](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/): first you must `import sys` and then use `!conda install --yes --prefix {sys.prefix}` followed by the name of the package/library that you want to install.

Here, for example, is the line to *install* the `pandas` library which is [used for data analysis](https://pandas.pydata.org/).

In [4]:
#code taken from https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
import sys
!conda install --yes --prefix {sys.prefix} pandas

Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done

# All requested packages already installed.



Once installed, the library needs to be *imported* to be activated and available to use in this code.

In [5]:
import pandas

Functions and methods from the library are typically prefixed by the name of the library and a period. You can spot a `pandas` function or method, then, because it begins `pandas.`.

In [46]:
#This code is adapted from https://pandas.pydata.org/pandas-docs/stable/basics.html
#It uses the date_range function to create a series of dates, then assigns those to a variable called 'index'
index = pandas.date_range('1/1/2000', periods=8)
print(index)

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08'],
              dtype='datetime64[ns]', freq='D')


Alternatively, you will sometimes see libraries installed and given a simpler alias, like `pd`:

In [6]:
import pandas as pd

In this case the functions will be preceded by the alias (in this case `pd.`):

In [47]:
#This code is adapted from https://pandas.pydata.org/pandas-docs/stable/basics.html
#It uses the date_range function to create a series of dates, then assigns those to a variable called 'index'
index = pd.date_range('1/1/2000', periods=8)
print(index)

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08'],
              dtype='datetime64[ns]', freq='D')


### The Scraperwiki library

Here is the line to import the [scraperwiki library](https://classic.scraperwiki.com/docs/python/python_help_documentation/) which contains useful functions for downloading webpages, extracting information from those, and saving that information.

In [9]:
!conda install --yes --prefix {sys.prefix} scraperwiki
import scraperwiki

Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ failed

PackagesNotFoundError: The following packages are not available from current channels:

  - scraperwiki

Current channels:

  - https://repo.continuum.io/pkgs/main/osx-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/osx-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/osx-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/osx-64
  - https://repo.continuum.io/pkgs/pro/noarch




Once imported, we can follow instructions on [the documentation for that library](https://classic.scraperwiki.com/docs/python/python_help_documentation/) to use particular functions.

### The lxml library

Another library we need is `lxml.html` - first the `lxml` package is installed then the `lxml.html` part of that.

In [22]:
!conda install --yes --prefix {sys.prefix} lxml
import lxml.html

Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done

## Package Plan ##

  environment location: /Users/paul/anaconda

  added / updated specs: 
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    icu-58.2                   |       h4b95b61_1        22.3 MB
    lxml-4.1.1                 |   py35hef8c89e_1         1.3 MB
    gettext-0.19.8.1           |       h15daf44_3         3.4 MB
    ncurses-6.0                |       hd04f020_2         842 KB
    libiconv-1.15              |       hdd342a3_7         1.3 MB
    expat-2.2.5                |       hb8e80ba_0         128 KB
    libffi-3.2.1               |      

### The `cssselect` library

In the code below we install a further library - `cssselect` to drill down into a scraped page using css selectors.

In [41]:
myurl = "https://www.bbc.co.uk/"
html = scraperwiki.scrape(myurl)
#print(html)
#convert it to an lxml object
root = lxml.html.fromstring(html)
print(root)

#cssselect is now its own library: https://cssselect.readthedocs.io/en/latest/
!conda install --yes --prefix {sys.prefix} cssselect
import cssselect
#these lines generate an error - unable to find cssselect



<Element html at 0x1044cc598>
Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done

## Package Plan ##

  environment location: /Users/paul/anaconda

  added / updated specs: 
    - cssselect


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cssselect-1.0.3            |           py35_0          28 KB

The following NEW packages will be INSTALLED:

    cssselect: 1.0.3-py35_0


Downloading and Extracting Packages
cssselect 1.0.3: ####################################################### | 100% 
Preparing transaction: / done
Verif

Now we can use that, as well as the other libraries, to drill into a scraped page and save it.

In [44]:
h3s = root.cssselect('h3')
#print(len(h3s))
#print h3s[0].text_content()
for h3 in h3s:
    print(h3.text_content())
record = {"name":"paul", "age": 25}
scraperwiki.sqlite.save(['name'], record, table_name='somepeople')

FA Cup: Chelsea v Hull  and Leicester v Sheffield Utd. Live now.
US election tampering charge for Russians
Why this toothbrush debate is dividing the internet
US election tampering charge for Russians
FBI admits botching tip on Florida gunman
Merkel 'curious' about UK's Brexit aims
FA Cup: Chelsea v Hull and Leicester v Sheffield Utd. Live now.
GB's Parsons wins skeleton bronze
Willian curls in beautiful Chelsea opener
Kiwi devastated after shock skeleton mistake
Jamaica to compete after beer producer 'donates' bobsleigh
Find out which countries are topping the medal table
11 forgotten songs about football stars
Does Marvel's Black Panther live up to the hype?
Films about brainwashing we can't take our eyes off
The Cardiff murder that sparked a scandal
The man who got away with $242m using 'black magic'
Couple surprise each other by proposing at the same time
Allison Janney felt 'liberated' in I, Tonya role
Tambor criticises Amazon sexual harassment investigation
Jennifer Aniston and J

## Next: pandas

https://www.dataquest.io/blog/python-pandas-databases/