Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TorrentProject.se DB dump for the Open Bay Project #10

Open
nicoboss opened this issue Jan 11, 2015 · 7 comments
Open

TorrentProject.se DB dump for the Open Bay Project #10

nicoboss opened this issue Jan 11, 2015 · 7 comments

Comments

@nicoboss
Copy link

http://torcache.net/torrent/EACEF60FE77A771ACBC28E6D65A593BDB800EA28.torrent
https://mega.co.nz/#!J5sm3YAS!_QId1T0SCZlGMkMGNNVa9vDmOtVgiZn4fmi350fFH_o
https://drive.google.com/file/d/0B13QNh6ZKU3TaGcwcWxqU0V5blk/view?usp=sharing
https://www.dropbox.com/s/urhx0lkn2bo1fzr/TorrentProject_DB_Dump_January_2015_V1.7z?dl=0
http://www.nicobosshard.ch/Documents/TorrentProject_DB_Dump_January_2015_V1.7z

Today I released a new DB for the Open Bay Project! This one is form the famous http://torrentproject.se/ site and was made by using the official dailydump. It contains 5685235 hashes. It’s a wonderful DB and the best thing is, that it’s very easy to update. This dump that contains a little more torrents than the one from kickass but less additional information. This program will also work for other sites that contain an api export function like https://kickass.so/ but don’t forget that HTML dumps contain much more additional information. Thanks @ekoice for your idea to use the api and emphasize the importance of a simple update process.
Now we have 4 DB Dumps:

  • The Offical CSV Release from the IsoHunt team
  • The Defult OPB Dump from @TPBT-OFFICIAL that has a lot of additional Information but contains only half the amount of hashes than the CSV release
  • The Kickass one from me
  • The TorrentProject one from me

How to import this DB:

  1. Download and extract the DB
  2. Import the 2 sql Files with LOAD DATA INFILE or with a graphical Interface like PHPMyAdmin.

How to update this DB in 2 Steps:

  1. Download and extract or clone https://github.com/nicoboss/KickassCopy and go into the UpdateTorrentProject folder and open the update.bat (Windows x64 only) file and wait some minutes. You can also use the UpdateTorrentProject in the Torrent file but it might be outdated but it would also work.
  2. Copy the dailydump.csv to your /mysql/data/…/ folder and customize and execute the following SQL script. The scheme for the torrentproject.dailydump can you find also in the torrent file and the db.torrent is your final DB.

LOAD DATA INFILE 'dailydump.csv' INTO TABLE torrentproject.dailydump_V3 FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

INSERT IGNORE INTO db.torrent (name, hash, tags) SELECT name, hash, tags FROM torrentproject.dailydump;

TRUNCATE TABLE torrentproject.dailydump;

UPDATE db.torrent SET description='' WHERE description is NULL;
UPDATE db.torrent SET category_id=2 WHERE tags like 'Applications%';
UPDATE db.torrent SET category_id=3 WHERE tags like 'Games%' or tags like 'Mobile%';
UPDATE db.torrent SET category_id=4 WHERE tags like 'Adult%';
UPDATE db.torrent SET category_id=5 WHERE tags like 'Video%';
UPDATE db.torrent SET category_id=6 WHERE tags like 'Audio%';
UPDATE db.torrent SET category_id=7 WHERE tags like 'Images%';
UPDATE db.torrent SET category_id=7 WHERE tags='';
UPDATE db.torrent SET category_id=8 WHERE tags='Video Tv';
UPDATE db.torrent SET category_id=9 WHERE tags like 'Ebooks%';

@ghost
Copy link

ghost commented Jan 12, 2015

this db is good, too bad havent torrent age, seeders, leechers, files count...
good work indeed...

@csmarauder
Copy link

Any chance we can get a sql file out of this also I have been considering taking all the dumps and making one big dump file out of them anybody think this is a good idea?

@nicoboss
Copy link
Author

I'll release one big all in one dump today. The reason why I've waited so long was that I don't get a reply of the bitsnoop team why their daily dump download link doesn't work. Now I've found a backup of the 26. February 2014 that contains 16949868 torrents (names and hashes). Not the best solution for the bitsnoop dump but enough to use it for my first all in one release.

@ghost
Copy link

ghost commented Jan 13, 2015

@nicoboss do that big backup have leechers/seeders working?

@nicoboss
Copy link
Author

No, sorry only names and hashes. I know maybe it's useless for you but as soon as the official bitsnoop api works again I will make some new version with seeders, leechers and categories. If they wouldn't fix it until the next week I wouldn't have any other choose then to make a HTML dump of bitsnoop or torrentz.eu. But booth aren't easy dumps. bitsnoop haven’t the hashes on the search pages and torrentz.eu gives my always an IP-ban after 1000 Downloads (1000*48=48000 Links/IP/Day). And the stupid thing is that it is nearly impossible to change the IP Address every 10min except by using the Tor Network but that doesn't support wget or HTTrack.
I'll release the full dump tomorrow because the duplicate check of 39766825 would take very long. I lost 4h because I used InnoDB instead of MyISAM. I'll never use InnoDB again for big databases!

InnoDB:
• Time to copy a 2.5GB DB: 3h
• Time to load the last row of 7'000'000: 90s
• Add randomly empty rows by copy a big DB often at the end!!!

MyISAM:
• Time to copy a 2.5GB DB: 60s
• Time to load the last row of 7'000'000: 10s

@TPBT-OFFICIAL
Copy link

@nicoboss interesting..... also, about the innoDB and MyIsam..... does that explain why tpbt.org (my website) takes so much time to load every once in a while?

@nicoboss
Copy link
Author

Yes probably. There are three things how you can optimize your speed:

  1. Expand all buffers. That'll use more memory but the speed will improve a lot.
  2. Use MyISAM (First copy the schema, then change the DB engine of the copied schema and finally copy all data into it. Don't conveart the InnoDB into a MyISAM with the convert function!)
  3. Use utf8_general_ci

Booth storing systems has their advances and disadvances but because Open Bay doesn’t need the special InnoDB functions so I think that MyISAM is the right choice. But if you tune up InnoDB you’ll also get a lot more speed. I don't know why InnoDB is so slow on my server (XAMPP Windows). Normally the different shouldn’t be so big. The best thing is to try bouth with diffrend configurations.

I'm using the following configurations in my my.ini config files to get more speed by using more RAM (500MB to 1GB):

[mysqld]
key_buffer = 16M
max_allowed_packet = 10M
sort_buffer_size = 20M
net_buffer_length = 64K
read_buffer_size = 20M
read_rnd_buffer_size = 20M
myisam_sort_buffer_size = 80M

innodb_buffer_pool_size = 512M
innodb_additional_mem_pool_size = 16M
innodb_log_file_size = 16M
innodb_log_buffer_size = 8M

[mysqldump]
quick
max_allowed_packet = 16M

[isamchk]
key_buffer = 40M
sort_buffer_size = 40M
read_buffer = 20M
write_buffer = 20M

[myisamchk]
key_buffer = 40M
sort_buffer_size = 40M
read_buffer = 20M
write_buffer = 20M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants