I have a prototype of they system for corrupting sentences, for different semantic meaning:

My process is:

     Tokenize the sentence (Currently using the NLTK regex tokenizer, it seem sufficient)
     Parts of Speech tag (Currently using the Stanford POS Tagger (via NLTK))
     For each word that is not blacklisted (I have blacklisted "had", "were", "have", "was", and "be", as they are unusual verbs with strange antoymns, further more the are rather syntatic)
         Use WordNet to find antonyms of the same POS tag  (So "Larger" (Noun, as in beer) has no antonyms, but "larger" (Ajd) has "Little" as an anytonym.
        Unstem: WordNet stemming/lemmaisation (of the antonym) removes Tense, Plurality, comparativeness, and superlativeness, so I make use the POS tag of the original to work those out, then restore them using the Pattern library's tools for this (http://jmlr.csail.mit.edu/papers/volume13/desmedt12a/desmedt12a.pdf)
         I remove any suggested antonyms that are short phrases (eg Wordnet suggests that "take_away" is an antonym of "add", however adding a work word change the structure of the sentence.)
     I substitute the antonyms in selecting randomly if there are multiple choices. (I still need to decide how many, putting in an even number often results in a double negative)
    I repair the indefinite articles ('an' vs 'a')
    I check the final sentence by sending it through the POS tagger and seeing if I get the same tags.


This final step is not perfect. Its not bad though.
It got a lot better when I changed to using the Stanford POS tagger, as it was more able to tag and retag correctly and thus was most consistent.

I have attached a (printout) of my method script.  At the the bottom you can see some examples of it's use

In [1]:
from __future__ import print_function
from __future__ import unicode_literals

import nltk
from nltk.corpus import wordnet as wn

import pattern.en as en

import itertools
import random
import copy

In [2]:
import os
from nltk.parse import stanford


In [3]:
from nltk.tag.stanford import POSTagger
os.environ['CLASSPATH'] = '/home/wheel/oxinabox/nltk_data/standford_models/stanford-postagger/stanford-postagger.jar'
os.environ['STANFORD_MODELS'] = '/home/wheel/oxinabox/nltk_data/standford_models/stanford-postagger/models/'

standford_pos_tagger = POSTagger("english-bidirectional-distsim.tagger")
def pos_tag(words):
    return standford_pos_tagger.tag(words)[0]
    
def tok_and_tag(sent):
    return pos_tag(nltk.tokenize.word_tokenize(sent))


In [4]:
referenced = en.referenced("yak") #Smarter than
referenced.split()

[u'a', u'yak']

In [5]:
def fix_indefinite_articles(words):
    """Alters a list of words in place so that the indefinate articles are correct. Eg replacing "An man" with "A man" """
    for ii in range(0,len(words)-1): #don't do the last word, as it can't ne an 'an' or an 'a'
        if words[ii] in frozenset(['an','a','An','A']):
            referenced_form = en.referenced(words[ii]) #Smarter than simple vowel match eg "a yak" not "an yak"
            replacement_article = referenced_form.split()[0] #'an' or 'a'
            if words[ii][0]=='A':
                replacement_article[0] == 'A' #Uppercase it
            words[ii]=replacement_article

    return words
    

In [6]:
def unstem_fun(pos_tag):
    """
    Handles the restemming of a particular POS tag after it has been converted to a Stem via wordnet lemmaisation.

    pos_tag is from https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html eg VBD
    
    This can be extended as required from https://www.nodebox.net/code/index.php/Linguistics
    """

    unstem_funs = {frozenset(['NNS', 'NNPS']) : en.pluralize,
                  frozenset(['RBR', 'JJR']) : en.comparative,
                  frozenset(['JJS']) : en.superlative, #Skip RBS, as ("Most") not changed by WordNet
                  frozenset(['VBD', 'VBN']) :  lambda w: en.conjugate(w, en.PAST), # A lot more of these can be made with en.conugate
                  frozenset(['VBG']) :  lambda w: en.conjugate(w,en.PRESENT,aspect=en.PROGRESSIVE ),
                 }
    
    for category in unstem_funs.keys():
        if pos_tag in category:
            return unstem_funs[category]
    else:
        return lambda x: x
        


In [7]:
def get_all_antonyms(word, pos=None):
    synsets = wn.synsets(word, pos=pos)
    for synset in synsets:
        for lemma in synset.lemmas():
            for anto in lemma.antonyms():
                yield anto.name()


In [8]:
#These constants define the types that I am interested in, as well as what POS tags they have for what wordnet tags
NOUN_POS_TAGS = frozenset(["NN", "NNS"])
ADJ_POS_TAGS = frozenset(["JJ","JJS", "JJR", "VBN"]) #VBN is here because it is hard to tell the difference between a VERB PAST PARTICPANT and an ADJECTIVE
VERB_POS_TAGS = frozenset(["VB","VBS", "VBN","VBG", "VBD"]) 
ADVERB_POS_TAGS = frozenset(["RB","RBS"])

banned_inputs_to_sub = frozenset(["had", "were", "have", "be", "was"]) #Changing these words tends to have huge impact on sentence, and they are had to change correctly

def get_pos_sub_function(pos_tag_set, wordnet_tag):
    def inner(tagged_words):
        for ii,(pword,p_pos_tag) in enumerate(tagged_words):
            if p_pos_tag in pos_tag_set and not(pword in banned_inputs_to_sub):
                unstem = unstem_fun(p_pos_tag)

                antos =  get_all_antonyms(pword, wordnet_tag)
                antos = map(unstem,antos)
                antos = filter(lambda w:not('_' in w), antos) #some WordNet lemmas are not single words. We don't use them.
                antos = list(antos)
                if len(antos)>0:
                    yield(ii, antos)
    return inner


#Define the functions: all take sequence of words as parameter
get_noun_subs = get_pos_sub_function(NOUN_POS_TAGS, wn.NOUN)
get_adj_subs = get_pos_sub_function(ADJ_POS_TAGS, wn.ADJ)
get_verb_subs = get_pos_sub_function(VERB_POS_TAGS, wn.VERB)
get_adverb_subs = get_pos_sub_function(ADVERB_POS_TAGS, wn.ADV)

In [9]:
def semantic_corruptions(sent):
    words = nltk.tokenize.word_tokenize(sent)
    tagged_words = pos_tag(words)
    corruptions = dict(itertools.chain(get_adj_subs(tagged_words),
                    get_noun_subs(tagged_words),
                    get_adverb_subs(tagged_words),
                    get_verb_subs(tagged_words),
                   ))
    for corrupt_index in corruptions.keys():
        antos = corruptions[corrupt_index]
        anto_index = random.randint(0,len(antos)-1)
        words[corrupt_index] = antos[anto_index]
    fix_indefinite_articles(words)
    return " ".join(words)
    
    


In [10]:
def checked_corruption(sent):
    original_words, original_pos_tags = zip(*tok_and_tag(sent))
    corrupted_sent = semantic_corruptions(sent)
    corupted_words, corrupted_pos_tags = zip(*tok_and_tag(corrupted_sent))
    
    for (ii,(o_tag,c_tag)) in enumerate(zip(original_pos_tags,corrupted_pos_tags)):
        if o_tag != c_tag:
            print("failed on: (%s) %s  != %s (%s)" % (original_words[ii], o_tag, c_tag, corupted_words[ii]))
       
    return corrupted_sent
    

In [11]:
checked_corruption("The article is the most common determiner (DT) in English.")

failed on: (most) RBS  != JJS (least)


u'The article is the least individual determiner ( DT ) in English .'

In [12]:
checked_corruption("We may have a question")

u'We may have an answer'

In [13]:
sent="Is changing an odd number of verbs, adverbs, adjectives and nouns to their antonyms expected to produce a semantically distant sentence?"
checked_corruption(sent)


failed on: (expected) VBN  != JJ (unexpected)


u'Is staying an even number of verbs , adverbs , adjectives and nouns to their synonyms unexpected to produce an semantically close acquittal ?'

In [14]:
checked_corruption("Both gerunds and infinitives can be used as the subject or the complement of a sentence.")

u'Both gerunds and infinitives can be misused as the subject or the complement of an acquittal .'

In [15]:
sent = "Shares of Xoma fell 16 percent in early trade, while shares of Genentech, a much larger company with several products on the market, were up 2 percent."
checked_corruption(sent)

u'Shares of Xoma ascended 16 percent in middle trade , while shares of Genentech , an much less company with several products on the market , were down 2 percent .'

In [16]:
checked_corruption("Six months ago, the IMF and Argentina struck a bare-minimum $6.8-billion debt rollover deal that expires in August.")

u'Six months ago , the IMF and Argentina missed an bare-minimum $ 6.8-billion debt rollover deal that expires in August .'

In [17]:
checked_corruption("He plans to have dinner with troops at Kosovo's U.S. military headquarters, Camp Bondsteel.")

u"He plans to have dinner with troops at Kosovo 's U.S. unmilitary headquarters , Camp Bondsteel ."

In [18]:
checked_corruption("After that, he plans to have dinner at Camp Bondsteel with U.S. troops stationed there.")

failed on: (stationed) VBD  != VBN (stationed)


u'After that , he plans to have dinner at Camp Bondsteel with U.S. troops stationed here .'

In [19]:
checked_corruption("He added that prosecutors will seek the death penalty.")

u'He subtracted that prosecutors will seek the birth reward .'

In [20]:
checked_corruption("Who is this man?")

u'Who is this woman ?'

In [21]:
checked_corruption("Who is that man?")

u'Who is that woman ?'

In [22]:
checked_corruption("This evil thief stole that car!")

u'This good thief stole that car !'

In [23]:
checked_corruption("The motorist is angry, so the pedestrian is understandably scared")

u'The motorist is unangry , so the pedestrian is unintelligibly scared'