TPO EXTRACTOR

This project is dedicated to the dumping of the internal database of the TPO application developed by zhan.com, and release all included TOEFL reading passages and listening transcripts in the format of Microsoft Word document.

This project is currently inactive, and is not expecting any contributions. However, if you are interested in the source code, you are welcome to clone this repository and use it as you like, subject to certain terms and conditions.

The TPO application is a test-simulation program made by a Chinese company specializing in providing English tranining services to domestic students preparing to take the TOEFL, IELTS and other kinds of exams. It contains a wide variety of learning materials that are helpful to the preparing students. However, the proprietary application forbids any copying or exporting of such contents, in order to gain market competitiveness for the parent company. This project aims to extract all these materials from the application package programmatically, and export them into accessible formats.

The database as I mentioned contains almost all the data, or the links to the data, that are used by the TPO application. Those include listening materials in the form of audio recording, listening transcripts, translations of listening transcripts, etc. It's technically possible to dump all of them in the form of readable formats(e.g. html, MS word document). This project has only extracted the reading passages and listening transcripts from the database.

The software company that made the TPO application may in the future decide to make changes to the application package such that this kind of content extraction is no longer possible. In that case, I have archived the latest working version of the application with wayback machine. To download it, click on this link.(you need a VPN to access wayback machine's website)

The majority of the Chinese people in mainland China have poor technical skills and the flow of information there is highly restrictive. Therefore they are being exploited, and controlled by such for-profit companies. The education in China is very disfigured these days partly because of their existence.

For how to re-generate the word documents with the source code. Please see the latest release note. If you want to know more details about how this program works, you can get in touch with me by email.

Document Preview

For a preview of these documents, go to this page.

Author and License

Copyright (C) 2018-2022 Scott X. Liang <scott.liang@pm.me>

Except where otherwise noted, the program in this repository is licensed under GNU General Public License Version 3.

And the released documents in this project are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
docs		docs
src		src
.classpath		.classpath
.gitignore		.gitignore
COPYING		COPYING
GENERATE.sh		GENERATE.sh
README.md		README.md
TpoExtractor.iml		TpoExtractor.iml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPO EXTRACTOR

Document Preview

Author and License

About

Releases 4

Contributors 3

Languages

License

scottpedia/TpoExtractor

Folders and files

Latest commit

History

Repository files navigation

TPO EXTRACTOR

Document Preview

Author and License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Contributors 3

Languages