Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Made corrections and changes in dataset.tex.
- Loading branch information
1 parent
d935813
commit 18e92b0
Showing
1 changed file
with
12 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,17 @@ | ||
|
||
We tried to find a dataset with handwritten text at the beginning of the project, but it turns out there are not that many available. | ||
% Not sure what Image is supposed to be referenced here. | ||
The datasets that do exist, like following image example(Figure 2), they would have needed a lot of preprocessing before we could use them in our project. | ||
We would have had to implement baseline slant normalization, skew correction, skeleton and so on. | ||
An attempt was made to find a dataset with handwritten text, but we failed to find one that suited our requirements. | ||
The datasets that were found would require a lot of preprocessing. | ||
Figure~\ref{} shows a sample from that kind of dataset. | ||
To get good result from that kind of dataset it would be necessary to implement baseline slant normalization, skew correction, skeleton and so on. | ||
|
||
Therefore, instead of spending a lot of time preprocessing the datasets, we implemented a Graphic User Interface to create our own dataset. | ||
The biggest advantages of this solution is that our solution records one pixel wide letters and the characters are already separated. | ||
The most important part of the work, image processing, was thus reduced significantly. | ||
The largest advantages of this solution is that our solution records one pixel wide lines and the characters are already separated. | ||
The large part of the work, image processing, was thus reduced significantly. | ||
Our dataset contains 100 examples for every capital letter in the Latin alphabet | ||
\footnote{The dataset is available together with the source code of the system. See appendix~\ref{app:source_code}.}. | ||
An example image from our character image dataset can be found in Figure~\ref{fig:image_feature_extraction}. | ||
|
||
Furthermore, if the vocabulary is relatively large, we found that it became easier for us to test the HMM. | ||
This is because our word training data is made up of randomly chosen samples. | ||
To get a dataset for training the word classifier a generator was created\footnote{Please see HandReco\/src\/api\/word\_examples\_generator.py in the source code for documentation of the word example generator. See appendix~\ref{app:source_code}.}. | ||
The generator creates random errors in the words given as input. | ||
To generate the dataset is obviously not optimal for practical applications, but it is good enough to test the implementation. | ||
|