Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine learning embedding middleware #152

Open
2 tasks done
KOLANICH opened this issue Feb 17, 2019 · 2 comments
Open
2 tasks done

Machine learning embedding middleware #152

KOLANICH opened this issue Feb 17, 2019 · 2 comments
Labels
Advanced Projects that require a high level of understanding of the topics specified, or programming in gnrl. Much work This project takes little time to complete. (ETA several weeks+)

Comments

@KOLANICH
Copy link

KOLANICH commented Feb 17, 2019

Project description

There are lot of neural networks focused on doing things similar to machine translation, but not limited to text:

  • transforming text into sound of speech
  • transcribing text into speech
  • describing pictures with text
  • drawing pictures based on text descriptions
  • changing voice
  • replacing faces
  • drawing faces based on principal vectors in latent space
  • machine translation from one language into another
  • creating HTML code basing on a picture drawn by designer
  • picture style transfer

They usually work the following way:

1 encoder network encodes initial representation into a vector in latent space
2 decoder network decodes that feature vector into the things needed

Encoders and decoders are trained simultaneously and differ from a model to model, so do the internal representations.

The idea is to take as many different nets as possible, cut in the middle where the internal representations appear, then add some small net to each (let's call it a transcoder), connect different nets and train the transcoders to converge to the same representations. Then standardize the internal vector meaning. Then we should be able to use combinations of sources and sinks available in no individual model.

Notation

Let's call the tyes of content on the ends of a network modalities:

  • seq<char, en> (a seq of characters, forming a text in English)
  • seq<char, ru>
  • seq<word_vec<...>, en>
  • picture
  • sound features
  • source code features
  • face features

Let's call intermediate representations transcoding modalities.

Let's call a model tranforming between transcoding modalities a transcoder.

In the following text we are using oriented graphs. An edge A -> [M] -> B means that model (under model here we mean that we can compute gradients) M transforms data in modality A into modality B. An edge A => [O] => B means it is possible to convert data in modality A into modality B using a black-box oracle O, such as a TTS program or external API. We may call an edge a model or an oracle.

API

ModalityId is an identifier of the modality. The standardized global representation has identifier global. Then there are modality-specific representations, which names are also standardized, like speech_latent, image_latent, etc...

When installed, the middleware exposes the API to applications:

getInfo(out info) - returns information about installed models, oracles and their modalities available.

startTransaction(out txId) - starts a transaction. All the models adding/removing stuff is done within transactions. In the end of a transaction the middleware checks consistency and removes all the transcoders ending to unreferenced transcoding modalities.

commit(in txId, out error) - checks consistency and commits

registerModel(in our_model_id, in inModalityId, in outModalityId, in our_model)
adds a net into the registry. our_net_id is the globally unique net id. Transcoder networks are registered the same way as modalities networks, but for ids uuids are used. When a uuid is used, the server recognizes the modality as a transcoding modality.
ourModel is the model serialezed, for example, into ONNX or CNTK.

unregisterModel(in netId) - removes the model.

registerModalityType(in name, in parameterCount)

registerModality(in type, in arguments)

registerDatasetType(in name, in modalityType, in modalityType)

registerOracleType(in sourceModalityType, in targetModalityType)

registerOracle(in oracleType, in entryPoint)

unregisterOracle(in oracleId)

getPath( in inputMkdalityId, in outModalityId, in pieces, in preference, out path) - finds the optimal path (a sequence of models) between modalities, using pieces to require certain networks and representations to be in a path and using preference as criteria of optimality.

convert(in inputRepresentation, in inputModalityId, in outModalityId, in path, in returnGradients, out outputRepresentation) - converts the input into output. If returnGradients is true, gradients are returned.

getModel(in netId, out net) - returns the serialized model. Used for training transcoders.

getTrainingOpportunities(in inModality, in outModality, out trainingOpportunities) - returns training opportunities.

Training opportunities

In order to train a model one needs a dataset. Datasets can be obtained different ways. getTrainingOpportunities is a function analysing the graph of available models and oracles and returning the candidates usefult for trining the specific model betaeen 2 modalities.

If G is the graph and we want to train a model A -> [M] -> B, where A and B are subgraphs in G, then a suitable dataset a, b (where a and b are modalities) is one, satysfying the following conditions:

  • a \in V_A, b \in V_B
  • for all b'', b' \in V_B \exists E_{b'', b'}, which is a model.

Then the middleware matches the endpoints against abailable dataset types.

A TrainingOpportunity is an object containing leaning graph and datasets matching this graph.

Training software should use training opportinities in order to get the needed datasets (for example from a repository of datasets like OpenML or using a database instructing the software how to retrieve and preprocess datasets or asking a user to do that) and then train the needed model.

Example

Assume we have the models installed (all models names are random)

  • text<en> -> [Painter] -> [PainterTranscoderText] -> text_latent <-> global <-> image_latent -> [PainterTranscoderImage] -> [Painter] -> image

  • speech<ru> -> [RuSpeechRecog] -> [RuSpeechRecogTranscoder] -> speech_latent <-> global <-> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru>

And oracles registered:

  • text<α> => [TTS] => speech<α>
  • text<α> => [machine translation] => text<β>

And datasets types registered:

  • text<α> - text_corpuses<α> (Wikipedia, Gutenberg, texts in Internet)
  • text<α>, text<b> - bilingval_sources<α, β>
  • speech<α> - speech corpuses<α> (videohostings, podcasts, songs)
  • speech<α>, text<α> - transcribed<α> videos and audios

And there exists a pretrained model text<en> -> [EnTTS] -> {..EnTTSLatent..} -> {..EnTTSLatent..} -> [EnTTS] -> speech<en>.

We wanna create 2 transcoders

  • {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent
  • speech_latent -> [EnTTSSpeechTranscoder] -> {..EnTTSLatent..}

So for the first transcoder we call

trainingOpportunities =  getTrainingOpportunities("{..EnTTSLatent..}" , "text_latent")

and should get

  • text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent <- [Painter] <- text<en> ( text_corpuses<en>, bilingval_sources<en, β>, transcribed<en> )

  • text<ru> => [machine translation] => text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru> ( text_corpuses<ru>, bilingval_sources<ru, β>, transcribed<ru> )

  • text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru> ( bilingval_sources<en, ru> )

  • text<ru> => [machine translation] => text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent <- [Painter] <- text<en> (bilingval_sources<en, ru>)

Example usage

Text to speech

path = server.getPath("text<en>", "voice<en>")
hwSound = server.convert("hello world", path)
... # play sound

Creating a transcoder

... #load model
... #cut it in the middle: EnTTSText , EnTTSSpeech
txId=server.startTransaction()
server.registerModel("EnTTS", "text<en>", "{..EnTTSLatent..}", EnTTSText)
server.registerModel("EnTTSTextTranscoder", "{..EnTTSLatent..}", "text_latent", None) # model is yet to be trained
server.registerModel("EnTTSSpeechTranscoder", "speech_latent", "{..EnTTSLatent..}", None) # model is yet to be trained
server.registerModel("EnTTS", "{...}", "speech<en>", EnTTSSpeech)
server.commit(txId)

def train(m1, m2, hyperparams):
    trainingOpportunities = server.getTrainingOpportunities(m1, m2)
    metatrainer = MetaTrainer( m1, m2, hyperparams)
    def transformPath(p, isSecondPart):
        for link in p:
            if isinstance(link, ModelLink):
                bigNet.stack(getModel(link.modelId), frozen=True)
            elif isinstance(link, OracleLink):
                metatrainer.transformedData[link.dstModality] = link.oracle.transform(metatrainer.transformedData[link.srcModality])

    def transformPart(ds, m1, isSecondPart):
        for m in ds:
            if not isSecondPart:
                p = findPath(m, m1)
            else:
                p = findPath(m1, m)
            if not p:
                continue
            metatrainer.transformedData[m] = ds[m]
            transformPath(p, isSecondPart )

    for op in trainingOpportunities:
        retriever.retrieve(op.datasets)
        for ds in op.datasets:
            bigNet = metatrainer.createFlow()
            transformPart(ds, m1, False)
            bigNet.stack( metatrainer.core, frozen=False)
            transformPart(ds, m2, True)
    
    metatrainer.train()
    return metatrainer.core

trained=train("{..EnTTSLatent..}" , "text_latent", ....)
txId=server.startTransaction()
server.registerModel("EnTTSTextTranscoder", "{..EnTTSLatent..}", "text_latent", trained)
server.commit(txId)

trained=train( "speech_latent", "{..EnTTSLatent..}" , ....)
txId=server.startTransaction()
server.registerModel("EnTTSSpeechTranscoder", "speech_latent", "{..EnTTSLatent..}", trained)
server.commit(txId)

Complexity and required time

Complexity

  • Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time (ETA)

  • Much work - The project will take more than a couple of weeks and serious planning is required
@Kreijstal
Copy link
Contributor

Well this idea is great, now the problem is collecting data, what about making project based around the collection of data

@KOLANICH
Copy link
Author

Well this idea is great

I am not sure. I am not a NN pro, maybe the idea is useless junk.

now the problem is collecting data, what about making project based around the collection of data

We don't need to train models from scratch. It is proposed to use pretrained models, cut them in the middle, and add small adapter networks. They are not as large as full networks, so they don't need so much data for training.

@FredrikAugust FredrikAugust added Advanced Projects that require a high level of understanding of the topics specified, or programming in gnrl. Much work This project takes little time to complete. (ETA several weeks+) labels May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Advanced Projects that require a high level of understanding of the topics specified, or programming in gnrl. Much work This project takes little time to complete. (ETA several weeks+)
Projects
None yet
Development

No branches or pull requests

3 participants