mousai

This project uses texts from project gutenberg and the Brown NLP Corpus to generate a series of lines that should be semantically correct. A parallel version will follow once the HMMTagger has been fixed and the program has been fully tested.

// Scan Books or other texts that do not have a lot of symbols to build vocabulary
→ Scan input files: 
	→ Extract sentences using end-of-line markers [? . !]
	→ Clean sentences by removing extra spaces among other unnecessary symbols
	→ Build model of sentence lengths
→ Build corpus vocabulary by splitting sentences and calculating word frequency

// Initialize Markov Chain based on in input texts
→ Initialize bigram and unigram prefix dictionary { bi/unigram : Array Of Prefixes }
→ For every sentence:
	→ Update the dictionary with prefixes and suffixes (duplicates allowed)


// Initialize + Train POS Tagging VMM Using Brown Corpus
→ Scan Brown Corpus input files:
	→ Extract word-POS pairs
	→ Extract POS patterns using end-of-sentence markers as delimiters

// Genetic Algorithm (GA)
→ Initialize ⇒ Build population of N individuals, with each individual having:
	→ A Sentence generated using the Markov Chain trained by the input texts/books
	→ A POS Tag Sequence for the sentence [Initialized with an empty ArrayList]
	→ A Fitness value of the sentence [Initialized as 0]

→ Evaluation ⇒ For each individual in the population:
→ Estimate the individual/sentence’s POS Tag Sequence using the Viterbi 
Algorithm from the VMM
→ Calculate the individual’s fitness value based on the POS Tag frequency in the Brown Corpus. As of now there is no smoothing algorithm implemented for patterns that are not recognized. 

	→ Selection ⇒ This is where the elitist part of the algorithm is introduced:
		→ If population has K individuals with the required fitness:
			// This part hasn't been implemented because I haven't found a good enough 
			// threshold to use. Shouldn't be too hard to implement though
			→ BREAK FROM GENETIC ALGORITHM
		→ Else:
			→ Sort the Individuals in terms of fitness
			→ Pick the N / 2 fittest individuals [Fitness > 0]
			→ If individuals with Fitness > 0 are less than N / 2:
				→ Discard Individuals with fitness = 0
				→ Replace them by a new set of randomly generated individuals
				→ GOTO: Evaluation


	→ Reproduction / Crossover ⇒ Do this N / 2 times:
		→ Randomly pick 2 parents from the N / 2 individuals
		→ Duplicate them
		→ Cross the duplicates at a position determined by the VMM training set
		→ Add the duplicates (children) to the population

	→ Mutation ⇒ For every individual in the population:
		→ Generate a random double
		→ If the double is less than the mutation rate
			→ Replace a random word with another word with the same POS tag
		→ GOTO: Evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
geneticalgorithm		geneticalgorithm
markov		markov
mousai		mousai
objects		objects
reader		reader
run		run
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mousai

About

Releases

Packages

Languages

ngobzin11/mousai

Folders and files

Latest commit

History

Repository files navigation

mousai

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages