Skip to content
Mathieu Jacomy edited this page Jan 9, 2019 · 9 revisions

Welcome to the Hyphe wiki!

Reference Tutorials

The following tutorials have been designed to cover all the basic features of Hyphe. The first tutorials focus more on explaining the elementary notions and techniques in the context of simple, top-down protocols. The last tutorials focus on more elaborate research designs and methodological questions.

Reference Tutorial 1: Website Structure

Analyse the structure of a website, in this case: https://climate.nasa.gov/

  • Creating a new corpus
  • Defining and crawling a web entity
  • Looking at the network of pages inside a web entity
  • Looking at the folder structure of the pages of a web entity

Reference Tutorial 2: Google Search Results

Study how the results of different Google queries are hyperlinked. In this case, about climate change.

  • Understanding the different statuses: IN, OUT, UNDECIDED & DISCOVERED
  • Changing the status of one or more web entities
  • Crawling all DISCOVERED web entities
  • Looking at a network of web entities

Reference Tutorial 3: Actors and their Ties

Retrace the network of a series of actors through hyperlinks. In this case, the IPCC author’s institutions.

  • Importing a CSV to crawl from a file
  • How to properly set the boundaries of a web entity
  • Detecting crawl issues and fixing start page errors
  • Tagging and advanced use of the Tagging page

About Hypertext Corpus Initiative

Hyphe is born from the Hypertext Corpus Initiative, a research group initiated by médialab Sciences Po in October 2010 to address the issue of building hypertext corpus for Social Sciences and potentially other domains.

The group first met at the kickoff workshop in médialab Sciences Po Paris on 7th and 8th October 2010.

The HCI's ambitions are to discuss on methodology about Hypertext corpus and to integrate web mining existing tool in a common technical chain.


Hyphe has been funded by the Equipex DIME-SHS (ANR-10-EQPX-19-01).

You can’t perform that action at this time.