Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Welcome to the Hyphe wiki!
The following tutorials have been designed to cover all the basic features of Hyphe. The first tutorials focus more on explaining the elementary notions and techniques in the context of simple, top-down protocols. The last tutorials focus on more elaborate research designs and methodological questions.
Analyse the structure of a website, in this case: https://climate.nasa.gov/
- Creating a new corpus
- Defining and crawling a web entity
- Looking at the network of pages inside a web entity
- Looking at the folder structure of the pages of a web entity
Study how the results of different Google queries are hyperlinked. In this case, about climate change.
- Understanding the different statuses: IN, OUT, UNDECIDED & DISCOVERED
- Changing the status of one or more web entities
- Crawling all DISCOVERED web entities
- Looking at a network of web entities
Retrace the network of a series of actors through hyperlinks. In this case, the IPCC author’s institutions.
- Importing a CSV to crawl from a file
- How to properly set the boundaries of a web entity
- Detecting crawl issues and fixing start page errors
- Tagging and advanced use of the Tagging page
About Hypertext Corpus Initiative
Hyphe is born from the Hypertext Corpus Initiative, a research group initiated by médialab Sciences Po in October 2010 to address the issue of building hypertext corpus for Social Sciences and potentially other domains.
The group first met at the kickoff workshop in médialab Sciences Po Paris on 7th and 8th October 2010.
The HCI's ambitions are to discuss on methodology about Hypertext corpus and to integrate web mining existing tool in a common technical chain.
Hyphe has been funded by the Equipex DIME-SHS (ANR-10-EQPX-19-01).