Skip to content

Which server specification for Hyphe?

Mathieu Jacomy edited this page Nov 22, 2019 · 1 revision

Although there is no general answer to that question, we propose a few landmarks. The two classic cases are (1) for students and (2) for researchers. They do not have the same needs. But you can have all of them on the same instance and/or on the same server. Use these figures as a basis.

A machine for students

We assume that 25 to 50 students will use the machine to learn web crawling. They will not do text analysis, so the content of web pages will not be stored. They will probably work in small groups and share the projects, which means that we expect 10 to 20 actual corpora (not counting the test corpora created during teaching). There will be Hyphe teaching sessions and during those, everyone will connect to Hyphe at the same time - then people will work from home at different moments.

This server requires RAM in priority, and CPU power because everyone will use the server at the same time during the lessons. We suggest:

  • CPU: 8-10 cores
  • RAM: ~16Gb
  • HDD: ~250Gb

A machine for researchers

We assume that researchers will rarely work all together at the same time. But we expect them to need text-mining (we must store content) and have deeper, bigger corpora. In this scenario, researchers will maintain 5 to 10 different big corpora. We expect them to launch large crawl batches and let Hyphe do its job for as long as necessary (e.g. week long crawl queues).

This server requires storage before all. CPU and RAM are not critical, but still necessary insofar as big corpora require additional computing power. It also requires a large bandwidth.

  • CPU: 6+ cores
  • RAM: 8-16Gb
  • HDD: ~1Tb

Minimal Hyphe server

  • CPU: 2+ cores
  • RAM: 4Gb
  • HDD: 50Gb

In this scenario, content must not be stored and not many corpora can be opened at the same time.

Custom spec

Use these numbers to estimate your needs:

  • Hyphe requires 250Mb to 500Mb RAM for each corpus running at the same time
  • A small, no-text, student corpus requires ~10Gb storage.
  • A large, text-enabled, researcher corpus requires 100Gb storage or more.