Skip to content

Package (Python) for Eurostat online glossaries' web scraping and semantic classification

License

Notifications You must be signed in to change notification settings

eurostat/estatNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

estatnet

Module for Eurostat online glossaries' web scraping and semantic classification

About

This module will enable you to automatically scrape Eurostat online_"Statistics Explained_" and index the contents of these pages into some sort of knowledge graph. It will actually build a graph of inter-relationships between the pages while extracting existing semantic contents (documentation, concepts, glossary, ...).

documentation
status since 2018 – in construction
contributors
license EUPL

Description

Notes

Resources

  • Framework Scrapy for extracting data from online websites.
  • Natural language toolkit nltk to work with human language data.
  • Package NetworkX for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
  • Module py2neo for neo4j graph database, though the bolt driver neo4j-python-driver does the job.

References

About

Package (Python) for Eurostat online glossaries' web scraping and semantic classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages