Skip to content

guilhermetavares/myscrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy

This project and readme its in github: https://github.com/guilhermetavares/myscrapy In the url https://www.fara.gov/quick-search.html, click on "Active Foreign Principals".

This url "https://efile.fara.gov/pls/apex/f?p=171:130:0::NO:RP,130:P130_DATERANGE:N" is the starts url in Scrapy.

With the startup url set, i inspect the page and she loads a POST in javascript for pagination the results, and this POST is the navigations pages for Scrapy.

The POST url "https://efile.fara.gov/pls/apex/wwv_flow.show" navigates on all data pages avaible.

The item has the structure:

{
        'address': '150 Broomielaw 5 Atlantic Quay',
        'country_name': 'UNITED KINGDOM',
        'date': '05/01/2012',
        'exhibit_url': 'http://www.fara.gov/docs/6334-Exhibit-AB-20160106-1.pdf',
        'foreign_principal': 'Scottish Development International',
        'registration': 'Development Counsellors International',
        'registration_date': '03/12/1993',
        'registration_number': '4777',
        'state': '',
        'url': 'https://efile.fara.gov/pls/apex/f?p=171:200:6624565563159::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:4777,Exhibit%20AB,UNITED%20KINGDOM'
}

First, clone the project:

git clone git@github.com:guilhermetavares/myscrapy.git

Install the requirements, on enviroment with Python >= 3.4.3:

pip install requirements.txt

To run the tests, cd /faragov/faragov/:

python3 tests.py

The response, running with coverage:

Name                          Stmts   Miss Branch BrPart  Cover
    ---------------------------------------------------------------
    faragov/__init__.py               0      0      0      0   100%
    faragov/middlewares.py           18     18      4      0     0%
    faragov/settings.py               4      4      0      0     0%
    faragov/spiders/__init__.py       0      0      0      0   100%
    faragov/spiders/fara.py         100      0     46      1    99%
    faragov/tests.py                 63      0      6      1    99%
    ---------------------------------------------------------------
    TOTAL                           185     22     56      2    88%

For running the spider:

scrapy crawl fara -o faragov.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages