Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance data for latest version #76

Closed
Nixellion opened this issue Oct 26, 2017 · 5 comments
Closed

Balance data for latest version #76

Nixellion opened this issue Oct 26, 2017 · 5 comments

Comments

@Nixellion
Copy link

I tried to adapt balance data code for the latest version, the one that supports this:

w = [1, 0, 0, 0, 0, 0, 0, 0, 0]
s = [0, 1, 0, 0, 0, 0, 0, 0, 0]
a = [0, 0, 1, 0, 0, 0, 0, 0, 0]
d = [0, 0, 0, 1, 0, 0, 0, 0, 0]
wa = [0, 0, 0, 0, 1, 0, 0, 0, 0]
wd = [0, 0, 0, 0, 0, 1, 0, 0, 0]
sa = [0, 0, 0, 0, 0, 0, 1, 0, 0]
sd = [0, 0, 0, 0, 0, 0, 0, 1, 0]
nk = [0, 0, 0, 0, 0, 0, 0, 0, 1]

But when I try to train such model, I get value out of range, errors. I suppose I do something wrong in balance data.

Is there an updated code anywhere? Or am I not supposed to balance data myself? Without it neural net always uses forward.

@Nixellion
Copy link
Author

Oh, ok. Seems that the problem is actually that collect_data is only recording 2 states: default (all zeroes) and w, so it just 'balances' data to the smallest array which is empty. Huh

@frossaren
Copy link

Ahhh so thats why i have huge problems getting it to work probably. Btw do we know anything about if this project is dead? Sentdex hasnt approved any pr or added anything in a long time.

@Nixellion
Copy link
Author

I don't know, but the stream with this bot runs pretty much 24\7

@kymckay
Copy link

kymckay commented Dec 6, 2017

For anyone looking at this in future, I have a slightly rewritten balance_data.py to handle an arbitrary number of choices and also repack files below a specified threshold of training data. Gist here.

@Phillyclause89
Copy link

This is how I went about modifying balance_data.py to balance across the 9 possible choices:

import numpy as np
import pandas as pd
from collections import Counter
from random import shuffle
import random

random.seed()
FILE_I_END = 7
offset = 10

data_order = [i for i in range(1, FILE_I_END + 1)]
shuffle(data_order)
for count, i in enumerate(data_order):
    try:
        random.seed()
        file_name = 'training_data-{}.npy'.format(i)
        # full file info
        train_data = np.load(file_name, allow_pickle=True)
        print('training_data-{}.npy'.format(i), len(train_data))
        df = pd.DataFrame(train_data)
        print(df.head())
        print(Counter(df[1].apply(str)))
        w = []
        s = []
        a = []
        d = []
        wa = []
        wd = []
        sa = []
        sd = []
        nk = []
        for data in train_data:
            img = data[0]
            choice = data[1]
            if choice == [1, 0, 0, 0, 0, 0, 0, 0, 0]:
                w.append([img, choice])
                shuffle(w)
            elif choice == [0, 1, 0, 0, 0, 0, 0, 0, 0]:
                s.append([img, choice])
                shuffle(s)
            elif choice == [0, 0, 1, 0, 0, 0, 0, 0, 0]:
                a.append([img, choice])
                shuffle(a)
            elif choice == [0, 0, 0, 1, 0, 0, 0, 0, 0]:
                d.append([img, choice])
                shuffle(d)
            elif choice == [0, 0, 0, 0, 1, 0, 0, 0, 0]:
                wa.append([img, choice])
                shuffle(wa)
            elif choice == [0, 0, 0, 0, 0, 1, 0, 0, 0]:
                wd.append([img, choice])
                shuffle(wd)
            elif choice == [0, 0, 0, 0, 0, 0, 1, 0, 0]:
                sa.append([img, choice])
                shuffle(sa)
            elif choice == [0, 0, 0, 0, 0, 0, 0, 1, 0]:
                sd.append([img, choice])
                shuffle(sd)
            elif choice == [0, 0, 0, 0, 0, 0, 0, 0, 1]:
                nk.append([img, choice])
                shuffle(nk)
            else:
                print('no matches')
        w = w[:len(s)][:len(a)][:len(d)][:len(wa)][:len(wd)][:len(sa)][:len(sd)][:len(nk)]
        s = s[:len(w)]
        a = a[:len(w)]
        d = d[:len(w)]
        wa = wa[:len(w)]
        wd = wd[:len(w)]
        sa = sa[:len(w)]
        sd = sd[:len(w)]
        nk = nk[:len(w)]

        final_data = w + s + a + d + wa + wd + sa + sd + nk
        shuffle(final_data)
        np.save('balanced_training_data-{}.npy'.format(i+offset), final_data)

    except Exception as e:
        print(str(e))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants