Skip to content

Commit

Permalink
Small change to the dataset section.
Browse files Browse the repository at this point in the history
  • Loading branch information
kjellwinblad committed Aug 1, 2011
1 parent 18e92b0 commit 9488767
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions report/dataset.tex
@@ -1,17 +1,17 @@

An attempt was made to find a dataset with handwritten text, but we failed to find one that suited our requirements.
An attempt was made to find a dataset with handwritten text, but no dataset that fulfilled our requirements was found.
The datasets that were found would require a lot of preprocessing.
Figure~\ref{} shows a sample from that kind of dataset.
To get good result from that kind of dataset it would be necessary to implement baseline slant normalization, skew correction, skeleton and so on.

Therefore, instead of spending a lot of time preprocessing the datasets, we implemented a Graphic User Interface to create our own dataset.
Therefore, instead of spending a lot of time preprocessing the datasets, we implemented a Graphical User Interface to create our own dataset.
The largest advantages of this solution is that our solution records one pixel wide lines and the characters are already separated.
The large part of the work, image processing, was thus reduced significantly.
Our dataset contains 100 examples for every capital letter in the Latin alphabet
\footnote{The dataset is available together with the source code of the system. See appendix~\ref{app:source_code}.}.
An example image from our character image dataset can be found in Figure~\ref{fig:image_feature_extraction}.

To get a dataset for training the word classifier a generator was created\footnote{Please see HandReco\/src\/api\/word\_examples\_generator.py in the source code for documentation of the word example generator. See appendix~\ref{app:source_code}.}.
To get a dataset for training the word classifier a generator was created\footnote{Please, see HandReco\/src\/api\/word\_examples\_generator.py in the source code for documentation of the word example generator. See appendix~\ref{app:source_code}.}.
The generator creates random errors in the words given as input.
To generate the dataset is obviously not optimal for practical applications, but it is good enough to test the implementation.

0 comments on commit 9488767

Please sign in to comment.