Skip to content

Commit

Permalink
Laserprec/production_release (#30)
Browse files Browse the repository at this point in the history
* Add github release step

* Test upload with sdist

* Update badge and doc

* Bump to 0.1.0rc4

* Debug prod release

* Undo debugging print

* Attempt to stabilize flaky tests

* Skip flaky tests

* Reorder welcome sections in main README

* Add steps to publish docs on github page
  • Loading branch information
Jianjie Liu committed Jul 20, 2021
1 parent 0e982f2 commit 7c25f06
Show file tree
Hide file tree
Showing 9 changed files with 110 additions and 36 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ test_out
# Environments
.env*
.venv*
**/.env/
env/
venv/
ENV/
Expand Down
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Genalog Changelog
All notable changes to this project will be documented in this file.

Types of changes
1. `Added` for new features.
1. `Changed` for changes in existing functionality.
1. `Deprecated` for soon-to-be removed features.
1. `Removed` for now removed features.
1. `Fixed` for any bug fixes.
1. `Security` in case of vulnerabilities.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and we adopt the [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.1.0] - 2021-07-19
### Added
- Initial package release:
- 3 standard HTML document template for generation
- basic image degradation effects including blur, bleed-through, salt & pepper and other morphological operations.
- 2 flavors of text alignment algorithm: Needleman-Wunsch (shorter text segments) and RETAS (longer text segments)
- Full e2e NER-OCR label generation notebooks
- See [documentation](https://microsoft.github.io/genalog/installation.html) for more on the initial features of the package.
52 changes: 24 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Genalog - Synthetic Data Generator

[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)
[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg) [![docs link](https://img.shields.io/badge/docs-jupyter--book-brightgreen)](https://microsoft.github.io/genalog/)

`Genalog` is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.
Genalog is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.

Overview
-------------------------------------
Expand All @@ -15,11 +15,31 @@ Genalog has various capabilities:

The aim of this project is to provide a complete solution for generating synthetic images from any text data rich in natural language and to imitate most of OCR noises founded in scanned text documents.

Please refer to our [Genalog documentation](https://microsoft.github.io/genalog) for more tutorials.

## Installation
See the [Genalog install guide](https://microsoft.github.io/genalog/installation.html) for more details.

To install the latest release:

`pip install genalog`

### Extra Installation Steps in MacOs and Windows
We have a dependency on [`Weasyprint`](https://weasyprint.readthedocs.io/en/stable/install.html), which in turn has non-python dependencies including `Pango`, `cairo` and `GDK-PixBuf` that need to be installed separately.

So far, `Pango`, `cairo` and `GDK-PixBuf` libraries are available in `Ubuntu-18.04` and later by default.

If you are running on Windows, MacOS, or other Linux distributions, please see [installation instructions from WeasyPrint](https://weasyprint.readthedocs.io/en/stable/install.html).

**NOTE**: If you encounter the errors like `no library called "libcairo-2" was found`, this is probably due to the three extra dependencies missing.

## Getting Started

The following is a summary of the common applications scenarios of Genalog. Please refer the [Jupyter notebook examples](https://github.com/microsoft/genalog/blob/master/example) that make use of the core code base of Genalog and repository utilities.

### TLDR
If you are interested in a full document generation and degration pipeline, please see the following notebook:

||Description|Indepth Jupyter Notebook Examples|
|-|-------------------------|--------|
|1|Analog Document Generation Pipeline|[Demo Notebook](https://github.com/microsoft/genalog/blob/master/example/generation_pipeline.ipynb)|[Here is guide to the core components](https://github.com/microsoft/genalog/blob/master/genalog/README.md)|
Expand All @@ -28,7 +48,7 @@ If you are interested in a full document generation and degration pipeline, plea
Else we have in-depth walkthroughs of each of the module in Genalog.

<p float="left">
<img src="example/static/genalog_components.png" width="900" />
<img src="https://github.com/microsoft/genalog/blob/main/example/static/genalog_components.png?raw=true" width="900" />
</p>

||Steps|Indepth Jupyter Notebook Examples|Quick Start Guides|
Expand All @@ -42,37 +62,13 @@ Else we have in-depth walkthroughs of each of the module in Genalog.
We also provide notebooks for the complete end-to-end scenario of generating a synthetic dataset connecting all the components of genalog:

<p float="left">
<img src="example/static/labeled_synthetic_pipeline.png" width="900" />
<img src="https://github.com/microsoft/genalog/blob/main/example/static/labeled_synthetic_pipeline.png?raw=true" width="900" />
</p>

||Scenario|Indepth Jupyter Notebook|
|-|-------------------------|--------|
|1|Synthetic Dataset Generation with LABELED NER Dataset|[Demo Notebook](https://github.com/microsoft/genalog/blob/master/example/dataset_generation.ipynb)|

Installation
-----------------------------
We are currently in a pre-release stage. Stable release is currently pushed to the [TestPyPI](https://test.pypi.org/project/genalog/).

`pip install -i https://test.pypi.org/simple/ genalog --extra-index-url https://pypi.org/simple`

### Extra Installation Steps in MacOs and Windows
We have a dependency on [`Weasyprint`](https://weasyprint.readthedocs.io/en/stable/install.html), which in turn has non-python dependencies including `Pango`, `cairo` and `GDK-PixBuf` that need to be installed separately.

So far, `Pango`, `cairo` and `GDK-PixBuf` libraries are available in `Ubuntu-18.04` and later by default.

If you are running on Windows, MacOS, or other Linux distributions, please see [installation instructions from WeasyPrint](https://weasyprint.readthedocs.io/en/stable/install.html).

**NOTE**: If you encounter the errors like `no library called "libcairo-2" was found`, this is probably due to the three extra dependencies missing.

### Installation from Source:

1. Create and activate the virtual environment you want to install the package:
1. `python -m venv .env`
1. `pip install --upgrade pip setuptools`
1. `source .env/bin/activate` or on Windows `.env/Scripts/activate.bat`
1. `git clone https://github.com/microsoft/genalog.git`
1. `cd genalog`
1. `pip install -e .`

### Other Requirements:

Expand Down
28 changes: 28 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Toucan Release Procedure

Checklist for the release process of `genalog`:

### Preparation
- [x] Ensure `main` branch contains all relevant changes and PRs relating to the specific release is merged
- [x] Create and switch to a new release branch (i.e. release-X.Y.Z)

### Package Metadata Update
- [x] Update VERSION.txt with version bump. Please reference [Semantic Versioning](https://semver.org/).
- [x] Update [CHANGELOG.md](./CHANGELOG.md)
- [x] Commit the above changes with title "Release vX.Y.Z"
- [x] Generate a new git tag for the new version (e.g. `git tag -a v0.1.0 -m "Initial Release"`)
- [x] Push the new tag to remote `git push origin v0.1.0`
- [x] Create a new PR with the above changes into `main` branch.

### Release to PyPI
- [x] Manually trigger the [release pipeline](https://dev.azure.com/genalog-dev/genalog/_build?definitionId=2) in DevOps on the release branch, this will publish latest version of `genalog` to PyPI.
- [x] Select `releaseType` to `Test` to test out the release in [TestPyPI](https://test.pypi.org/project/genalog/)
- [x] Rerun and switch `releaseType` to production if looks good.
- [x] If the pipeline ran successfully, check and publish the draft of this release on [Github Release](https://github.com/microsoft/genalog/releases)
- [x] Latest version is pip-installable with:
- `pip install genalog`

### Update Documentation on Github Page
- [x] Staying on the release branch, `cd docs && pip install -r requirements-doc.txt`
- [x] Build the jupyter-book with `jupyter-book build --all genalog_docs`
- [x] Preview the HTML files, if looks good [publish to Github Page](https://jupyterbook.org/start/publish.html#publish-your-book-online-with-github-pages): `ghp-import -n -p -f genalog_docs/_build/html`
2 changes: 1 addition & 1 deletion VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.0.1-alpha3
0.1.0-rc5
31 changes: 27 additions & 4 deletions devops/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@ steps:
pip install --upgrade pip
pip install setuptools wheel
python setup.py bdist_wheel --dist-dir dist
python setup.py sdist --dist-dir dist
workingDirectory: $(Build.SourcesDirectory)
displayName: 'Building wheel package'
displayName: 'Building wheel package & sdist'

- bash: |
pip install twine
Expand All @@ -47,9 +48,31 @@ steps:
inputs:
pythonUploadServiceConnection: testpypi
condition: ${{eq(parameters.releaseType, 'Test')}}
displayName: 'Twine Authentication for ${{parameters.releaseType}}'
displayName: 'Twine Authentication for Test'

- task: TwineAuthenticate@1
inputs:
pythonUploadServiceConnection: pypi
condition: ${{eq(parameters.releaseType, 'Production')}}
displayName: 'Twine Authentication for Production'

- bash: |
twine upload --verbose -r genalog --config-file $(PYPIRC_PATH) dist/*
twine upload --verbose -r genalog --config-file $(PYPIRC_PATH) dist/*.whl
workingDirectory: $(Build.SourcesDirectory)
displayName: 'Uploading wheel package to ${{parameters.releaseType}} PyPI'
displayName: 'Uploading Wheel to ${{parameters.releaseType}} PyPI'

- task: GitHubRelease@1
inputs:
gitHubConnection: 'github.com_laserprec'
repositoryName: 'microsoft/genalog'
action: 'create'
target: '$(Build.SourceVersion)'
tagSource: 'gitTag'
tagPattern: 'v.*'
releaseNotesFilePath: 'CHANGELOG.md'
assets: '$(Build.SourcesDirectory)/dist/*'
isDraft: true
changeLogCompareToRelease: 'lastFullRelease'
changeLogType: 'commitBased'
condition: ${{eq(parameters.releaseType, 'Test')}}
displayName: 'Prepare GitHub Release (Draft)'
2 changes: 1 addition & 1 deletion docs/genalog_docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Synthetic Document Generator

[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)
[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg) [![docs link](https://img.shields.io/badge/docs-jupyter--book-brightgreen)](https://microsoft.github.io/genalog/)

````{margin}
```sh
Expand Down
6 changes: 5 additions & 1 deletion tests/e2e/test_ocr_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,13 @@ def test_upload_images(self, use_async):
), f"folder {dst_folder} was not deleted"


@pytest.mark.skip(reason=(
"Flaky test. Going to deprecate the ocr module in favor of the official python SDK:\n"
"https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts-sdk/client-library?tabs=visual-studio&pivots=programming-language-python" # noqa:E501
))
@pytest.mark.azure
class TestGROKe2e:
@pytest.mark.parametrize("use_async", [False, True])
@pytest.mark.parametrize("use_async", [False])
def test_grok_e2e(self, tmpdir, use_async):
grok = Grok.create_from_env_var()
src_folder = "tests/unit/ocr/data/img"
Expand Down
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,6 @@ application-import-names=genalog, tests
# Native flake8 configs
max-line-length = 140
exclude =
build, dist
build, dist, docs
.env*,.venv* # local virtual environments
.tox

0 comments on commit 7c25f06

Please sign in to comment.