Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise the documentation. #204

Merged
merged 29 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
73cdceb
Add not to dump Wikipedia
robkam Oct 8, 2023
428f2fd
Add not to dump Wikipedia
robkam Oct 11, 2023
17f0964
Add to not dump Wikipedia.
robkam Oct 18, 2023
c4b81f8
Create INSTALLATION.md
robkam Oct 18, 2023
a4ed04c
Create USAGE.md
robkam Oct 18, 2023
5b827b6
tidy up grep command
robkam Oct 18, 2023
30b5fff
Update README.md
robkam Oct 18, 2023
05ce377
Update INSTALLATION.md
robkam Oct 18, 2023
0847224
Update INSTALLATION.md
robkam Oct 18, 2023
47ad947
Update USAGE.md
robkam Oct 18, 2023
db0c27e
insert blank line after heading
robkam Oct 18, 2023
b764a6e
Create PUBLISHING.md
robkam Oct 18, 2023
467dee1
Update uploader.py
robkam Oct 18, 2023
9fe21a1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2023
6f0d092
fix typo
robkam Oct 18, 2023
aea05ea
change link to PUBLISHING.md
robkam Oct 18, 2023
53e40c5
tidy titles
robkam Oct 18, 2023
df591c0
better title for last section
robkam Oct 18, 2023
05fe7ea
rm line better in installation.md
robkam Oct 18, 2023
577c0f1
run in any directory
robkam Oct 18, 2023
e43ca9d
fix typo
robkam Oct 18, 2023
06adea1
run dumpgenerator in any folder
robkam Oct 18, 2023
463887b
link to usage for launcher & uploader
robkam Oct 18, 2023
73a83e1
Publishing the dump
robkam Oct 18, 2023
661e41d
insert missing word
robkam Oct 18, 2023
c3544f5
clarify instructions
robkam Oct 18, 2023
2ec81df
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2023
c0f1311
instructions are now at MediaWiki Dump Generator
robkam Oct 18, 2023
b4f2e6d
to dump a wiki that only logged in users can read
robkam Oct 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions INSTALLATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
# Installation

## Python Environment

`MediaWiki Dump Generator` requires [Python 3.8](https://www.python.org/downloads/release/python-380/) or later (less than 4.0), but you may be able to get it run with earlier versions of Python 3. On recent versions of Linux and macOS Python 3.8 should come preinstalled, but on Windows you will need to install it from [python.org](https://www.python.org/downloads/release/python-380/).

`MediaWiki Dump Generator` has been tested on Linux, macOS, Windows and Android. If you are connecting to Linux or macOS via `ssh`, you can continue using the `bash` or `zsh` command prompt in the same terminal, but if you are starting in a desktop environment and don't already have a preferred Terminal environment you can try one of the following.

> **NOTE:** You may need to update and pre-install dependencies in order for `MediaWiki Dump Generator` to work properly. Shell commands for these dependencies appear below each item in the list. (Also note that while installing and running `MediaWiki Dump Generator` itself should not require administrative priviliges, installing dependencies usually will.)

* On desktop Linux you can use the default terminal application such as [Konsole](https://konsole.kde.org/) or [GNOME Terminal](https://help.gnome.org/users/gnome-terminal/stable/).

<details>
<summary>Linux Dependencies</summary>

While most Linux distributions will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly you may need to install `git`.

On Debian, Ubuntu, and the like:

```bash
sudo apt update && sudo apt upgrade && sudo install git
```

(On Fedora, Arch, etc., use `dnf`, `pacman`, etc., instead.)

</details>

* On macOS you can use the built-in application [Terminal](https://support.apple.com/guide/terminal), which is found in `Applications/Utilities`.

<details>
<summary>macOS Dependencies</summary>

While macOS will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly and you are using an older versions of macOS, you may need to install `git`.

If `git` is not preinstalled, however, macOS will prompt you to install it the first time you run the command. Therefore, to check whether you have `git` installed or to install `git`, simply run `git` (with no arguments) in Terminal:

```bash
git
```

If `git` is already installed, it will print its usage instructions. If `git` is not preinstalled, the command will pop up a window asking if you want to install Apple's command line developer tools, and clicking "Install" in the popup window will install `git`.

</details>

* On Windows 10 or Windows 11 you can use [Windows Terminal](https://aka.ms/terminal).

<details>
<summary>Windows Dependencies</summary>

The latest version of Python is available from [python.org](https://www.python.org/downloads/). Python will then be available from any Command Prompt or PowerShell session. Optionally, adding C:\Program Files\Git\usr\bin to the PATH environment variable will add some some useful Linux commands and utilities to Command Prompt.

If you are already using the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about), you can follow the Linux instructions above. If you don't want to install a full WSL distribution, [Git for Windows](https://gitforwindows.org/) provides Bash emulation, so you can use it as a more lightweight option instead. Git Bash also provides some useful Linux commands and utilities.

> When installing [Python 3.8](https://www.python.org/downloads/release/python-380/) (from python.org), be sure to check "Add Python to PATH" so that installed Python scripts are accessible from any location. If for some reason installed Python scripts, e.g. `pip`, are not available from any location, you can add Python to the `PATH` environment variable using the instructions [here](https://datatofish.com/add-python-to-windows-path/).
>
> And while doing so should not be necessary if you follow the instructions further down and install `MediaWiki Dump Generator` using `pip`, if you'd prefer that Windows store installed Python scripts somewhere other than the default Python folder under `%appdata%`, you can also add your preferred alternative path such as `C:\Program Files\Python3\Scripts\` or a subfolder of `My Documents`. (You will need to restart any terminal sessions in order for this to take effect.)

Whenever you'd like to run a Bash session, you can open a Bash terminal prompt from any folder in Windows Explorer by right-clicking and choosing the option from the context menu. (For some purposes you may wish to run Bash as an administrator.) This way you can open a Bash prompt and clone the `MediaWiki Dump Generator` repository in one location, and subsequently or later open another Bash prompt and run `MediaWiki Dump Generator` to dump a wiki wherever else you'd like without having to browse to the directory manually using Bash.

</details>

* On Android you can use [Termux](https://termux.dev).

<details>
<summary>Termux Dependencies</summary>

```bash
pkg update && pkg upgrade && pkg install git libxslt python
```

</details>

* On iOS you can use [iSH](https://ish.app/).

<details>
<summary>iSH Dependencies</summary>

```bash
apk update && apk upgrade && apk add git py3-pip
```

> **Note:** iSH may automatically quit if your iOS device goes to sleep, and it may lose its status if you switch to another app. You can disable auto-sleep while iSH is running by clicking the gear icon and toggling "Disable Screen Dimming". (You may wish to connect your device to a charger while running iSH.)

</details>

## Downloading and installing dumpgenerator

The Python 3 port of the `dumpgenerator` module of `wikiteam3` is largely functional and can be installed from a downloaded or cloned copy of this repository.

> If you run into a problem with the version that mostly works, you can [open an Issue](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues/new/choose). Be sure to include the following:
>
> 1. The operating system you're using
> 2. What command you ran that didn't work
> 3. What output was printed to your terminal

### 1. Downloading and installing `MediaWiki Dump Generator`

In whatever folder you use for cloned repositories:

```bash
git clone https://github.com/mediawiki-client-tools/mediawiki-dump-generator
```

```bash
cd mediawiki-dump-generator
```

```bash
poetry update && poetry install && poetry build
```

```bash
pip install --force-reinstall dist/*.whl
```

<details>
<summary>For Windows Command Prompt, enter this pip command instead, (in a batch file use %%x).</summary>

```bash
for %x in (dist\*.whl) do pip install --force-reinstall %x
```

</details>
<details>
<summary>For Windows Powershell, enter this pip command instead.</summary>

```bash
pip install --force-reinstall (Get-ChildItem .\dist\*.whl).FullName
```

</details>

### 2. Running `dumpgenerator` for whatever purpose you need

After installing `MediaWiki Dump Generator` using `pip` you should be able to use the `dumpgenerator` command from any local directory.

```bash
dumpgenerator [args]
```

### 3. Uninstalling the package and deleting the cloned repository when you're done

```shell
pip uninstall wikiteam3
```

```bash
rm -fr [cloned mediawiki dump generator folder]
```

### 4. Updating MediaWiki Dump Generator

> **Note:** Re-run the following steps each time to reinstall each time the MediaWiki Dump Generator branch is updated.

```bash
git pull
```

```bash
poetry update && poetry install && poetry build
```

```bash
pip install --force-reinstall dist/*.whl
```

<details>
<summary>For Windows Command Prompt, enter this pip command instead, (in a batch file use %%x).</summary>

```bash
for %x in (dist\*.whl) do pip install --force-reinstall %x
```

</details>
<details>
<summary>For Windows Powershell, enter this pip command instead.</summary>

```bash
pip install --force-reinstall (Get-ChildItem .\dist\*.whl).FullName
```

</details>

### 5. Manually build and install `MediaWiki Dump Generator`

If you'd like to manually build and install `MediaWiki Dump Generator` from a cloned or downloaded copy of this repository, run the following commands from the downloaded base directory:

```bash
curl -sSL https://install.python-poetry.org | python3 -
```

```bash
poetry update && poetry install && poetry build
```

```bash
pip install --force-reinstall dist/*.whl
```

<details>
<summary>For Windows Command Prompt, enter this pip command instead, (in a batch file use %%x).</summary>

```bash
for %x in (dist\*.whl) do pip install --force-reinstall %x
```

</details>
<details>
<summary>For Windows Powershell, enter this pip command instead.</summary>

```bash
pip install --force-reinstall (Get-ChildItem .\dist\*.whl).FullName
```

</details>

### 6. To run the test suite

To run the test suite, run:

```bash
test-dumpgenerator
```

### 7. Switching branches

```bash
git checkout --track origin/python3
```
36 changes: 36 additions & 0 deletions PUBLISHING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Publishing the dump

Publishing your dumps to the [Internet Archive's wikiteam collection](https://archive.org/details/wikiteam) is easily done. First [sign up](https://archive.org/account/signup) or [login](http://archive.org/account/login.php).

## Launcher and uploader

Instructions on using the scripts `launcher` and `uploader` are in the file [Usage](./USAGE.md).

## Automatic publishing

Just use `uploader` (especially if you have multiple wikis): the script takes the filename of a list of wikis as argument and uploads their dumps to archive.org. You only need to:

- Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line.
- [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`.
- Run the script `uploader listfile`.

## Manual publishing

- After running dumpgenerator, in each dump folder, select all files, right-click on the selection, click 7-Zip, click `Add to archive...` and click OK.
- At Archive.org, for each wiki [create a new item](http://archive.org/create/).
- Click `Upload files`. Then either drag and drop the 7-Zip archive onto the box or click `Choose files` and select the 7-Zip archive.
- `Page Title` and `Page URL` will be filled in by the uploader.
- Add a short `Description`, such as a descriptive name fopr the wiki.
- Add `Subject Tags`, separated by commas, these are the keywords that will help the archive to show up in a Internet Archive search, e.g. wikiteam,wiki,subjects of the wiki, and so on.
- `Creator`, can be left blank.
- `Date`, can be left blank.
- `Collection`, select `Community texts`.
- `Language`, select the language of the wiki.
- `License`, click to expand and select Creative Commons, Allow Remixing, Require Share-Alike for a CC-BY-SA licence.
- Click `Upload and Create Your Item`.

With the subject tag of wikiteam and collection of community texts, your uploads should appear in a search for [subject:"wikiteam" AND collection:opensource](https://archive.org/search?query=subject%3A%22wikiteam%22+AND+collection%3Aopensource).

## Info for developers

- [Internet Archive’s S3 like server API](https://archive.org/developers/ias3.html).
Loading