Skip to content

Commit

Permalink
Laserprec/jupyter book doc (#28)
Browse files Browse the repository at this point in the history
* Add documentation in jupyter-book
* Add Trademark Notice
  • Loading branch information
Jianjie Liu committed Jul 19, 2021
1 parent 6180948 commit 0e982f2
Show file tree
Hide file tree
Showing 40 changed files with 1,497 additions and 327 deletions.
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Build Status](https://dev.azure.com/genalog-dev/genalog/_apis/build/status/Nightly-Build?branchName=main)](https://dev.azure.com/genalog-dev/genalog/_build/latest?definitionId=4&branchName=main) ![Azure DevOps tests (compact)](https://img.shields.io/azure-devops/tests/genalog-dev/genalog/4?compact_message) ![Azure DevOps coverage (main)](https://img.shields.io/azure-devops/coverage/genalog-dev/genalog/4/main) ![Python Versions](https://img.shields.io/badge/py-3.6%20%7C%203.7%20%7C%203.8%20-blue) ![Supported OSs](https://img.shields.io/badge/platform-%20linux--64%20-red) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)

Genalog is an open source, cross-platform python package allowing to generate synthetic document images with text data. Tool also allows you to add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.
`Genalog` is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.

Overview
-------------------------------------
Expand Down Expand Up @@ -85,16 +85,23 @@ If you are running on Windows, MacOS, or other Linux distributions, please see [

Repo Structure
-------------------
Tools-Synthetic-Data-Generator
genalog
├────genalog
│ ├─── generation # generate text images
│ ├──── degradation # methods for image degradation
│ ├──── ocr # running the Azure Search Pipeline
│ └──── text # methods to Align OCR Output Text with Input Text
├────examples # Example Jupyter Notebooks for Various Synthetic Data Generation Scenarios
├────tests # PyTest files
├────README.md # Main Readme file
└────LICENSE # License file
│ └──── text # methods to Align OCR Output Text with
├────devops # CI/CD pipelines
├────docs # containing online documentaions
├────examples # example Jupyter Notebooks for Various
├────tests # tests
├────tox.ini # CI orchestration and configurations
├────README.md
└────LICENSE

Trademark Notice
--------------------
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Microsoft Open Source Code of Conduct
-------------------------------------
Expand All @@ -118,7 +125,6 @@ For more information see the [Code of Conduct FAQ](https://opensource.microsoft.
or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.



Collaborators
-------------------------------------
Genalog was originally developed by the [MAIDAP team at Microsoft Cambridge NERD](http://www.microsoftnewengland.com/nerd-ai/) in association with the Text Analytics Team in Redmond.
6 changes: 3 additions & 3 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
_build/
_static/
_templates/
**/example.txt
**/_build
**/data
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

67 changes: 0 additions & 67 deletions docs/conf.py

This file was deleted.

29 changes: 0 additions & 29 deletions docs/genalog/genalog.degradation.rst

This file was deleted.

32 changes: 0 additions & 32 deletions docs/genalog/genalog.rst

This file was deleted.

46 changes: 46 additions & 0 deletions docs/genalog_docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
title : <h1 style="font-size:2em;text-align:center;color:#FF5733">Genalog</h1>
author: Jianjie Liu and Amit Gupte
# logo: 'qe-logo-large.png'

# Short description about the book
description: >-
Guide for end-to-end synthetic analog document generation
execute:
execute_notebooks : off

# Interact link settings
notebook_interface : "notebook"

# Launch button settings
repository:
url : https://github.com/microsoft/genalog
path_to_book : /docs/genalog_docs
branch : main

launch_buttons:
notebook_interface : classic

# HTML-specific settings
html:
home_page_in_navbar : false
use_repository_button : true

# # LaTeX settings
# bibtex_bibfiles:
# - _bibliography/references.bib
# latex:
# latex_engine : "xelatex"
# latex_documents:
# targetname: book.tex

sphinx:
extra_extensions:
- sphinx_inline_tabs
- sphinx.ext.autodoc
- sphinx.ext.napoleon
- sphinx.ext.viewcode
config:
napoleon_google_docstring: True
autodoc_member_order: groupwise
autoclass_content: both
24 changes: 24 additions & 0 deletions docs/genalog_docs/_toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
root: index
format: jb-book
defaults:
numbered: false
parts:
- caption: Getting Started
chapters:
- file: installation
- file: generation_pipeline
- file: e2e_dataset_pipeline
- caption: Fabricating Document & Noise
chapters:
- file: doc_generation
- file: doc_degradation
- caption: Handling Noisy Text
chapters:
- file: text_alignment
- file: ocr_label_propagation
- caption: API Documentation
chapters:
- file: docstring/genalog.degradation
- file: docstring/genalog.generation
- file: docstring/genalog.ocr
- file: docstring/genalog.text
Loading

0 comments on commit 0e982f2

Please sign in to comment.