New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High max error #1

Closed
snakers4 opened this Issue Dec 5, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@snakers4

snakers4 commented Dec 5, 2017

Ran /tests/resnet50.py and /tests/alexnet.py

For resnet50 I got this error

1.88754e+08
Max error: 1079778560.0

For alexnet I got this error

0.062452
Max error: 0.11586721986532211

Is this expected behavior, or is there anything wrong with my versions / set-up?

@snakers4

This comment has been minimized.

Show comment
Hide comment
@snakers4

snakers4 Dec 5, 2017

For resnet18

13452.2
Max error: 32325.888671875

snakers4 commented Dec 5, 2017

For resnet18

13452.2
Max error: 32325.888671875
@snakers4

This comment has been minimized.

Show comment
Hide comment
@snakers4

snakers4 Dec 5, 2017

Also in running my own model I got this error - maybe padding conversion is some kind of issue

ValueError: Unsuported padding size for convolution

I guess this is due to my architecture being an inception4 architecture - it has non-symmetric filters and this is solved differently in keras and pytorch. Anyway - which solution would you suggest for this edge case?

snakers4 commented Dec 5, 2017

Also in running my own model I got this error - maybe padding conversion is some kind of issue

ValueError: Unsuported padding size for convolution

I guess this is due to my architecture being an inception4 architecture - it has non-symmetric filters and this is solved differently in keras and pytorch. Anyway - which solution would you suggest for this edge case?

@snakers4

This comment has been minimized.

Show comment
Hide comment
@snakers4

snakers4 Dec 5, 2017

I overcame the inception error with such modification, but probably it is not entirely correct

        if node.padding[0] != node.padding[1]:
            # originally this line was not commented
            # raise ValueError('Unsuported padding size for convolution')
            
            # quick fix for inception architectures
            # refer here for more info https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py
            border_mode = 'same'
        else:
            # this code initially was under no condition
            padding = node.padding[0]
            if padding > 0:
                padding_name = output_name + '_pad'
                padding_layer = keras.layers.ZeroPadding2D(
                    padding=node.padding,
                    name=padding_name
                )
                layers[padding_name] = padding_layer(layers[input_name])
                input_name = padding_name      
                
            # this line below also was applied unconditionally
            border_mode = 'valid'

snakers4 commented Dec 5, 2017

I overcame the inception error with such modification, but probably it is not entirely correct

        if node.padding[0] != node.padding[1]:
            # originally this line was not commented
            # raise ValueError('Unsuported padding size for convolution')
            
            # quick fix for inception architectures
            # refer here for more info https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py
            border_mode = 'same'
        else:
            # this code initially was under no condition
            padding = node.padding[0]
            if padding > 0:
                padding_name = output_name + '_pad'
                padding_layer = keras.layers.ZeroPadding2D(
                    padding=node.padding,
                    name=padding_name
                )
                layers[padding_name] = padding_layer(layers[input_name])
                input_name = padding_name      
                
            # this line below also was applied unconditionally
            border_mode = 'valid'
@nerox8664

This comment has been minimized.

Show comment
Hide comment
@nerox8664

nerox8664 Dec 5, 2017

Owner

Hello @snakers4.

I have very different maximal errors:
For the ResNet18 ~ 2e-06.
For the ResNet50 ~8e-5.
For the AlexNet ~1e-8.

Is this expected behavior, or is there anything wrong with my versions / set-up?

Seems like something is wrong. Can you check your backend for Keras (should be tensorflow)? Keras config is located in the ~/.keras/keras.json.

I've tested converter with these versions:

  • Keras version: 2.1.1
  • TensorFlow version: 1.4.0
Owner

nerox8664 commented Dec 5, 2017

Hello @snakers4.

I have very different maximal errors:
For the ResNet18 ~ 2e-06.
For the ResNet50 ~8e-5.
For the AlexNet ~1e-8.

Is this expected behavior, or is there anything wrong with my versions / set-up?

Seems like something is wrong. Can you check your backend for Keras (should be tensorflow)? Keras config is located in the ~/.keras/keras.json.

I've tested converter with these versions:

  • Keras version: 2.1.1
  • TensorFlow version: 1.4.0
@snakers4

This comment has been minimized.

Show comment
Hide comment
@snakers4

snakers4 Dec 5, 2017

Hi,

Here is my keras.json

{
    "floatx": "float32",
    "backend": "tensorflow",
    "image_data_format": "channels_first",
    "epsilon": 1e-07
}

My versions are

  • tensorflow-gpu (1.3.0)
  • keras 2.0.8
  • pytorch
  • torch (0.2.0.post4)

I am running all of this inside a docker container, but I guess this should not really affect anything.
Just for completeness, my Dockerfile is below

# DOCKER FILE START 
FROM nvidia/cuda:8.0-cudnn6-devel

RUN apt-get update && apt-get install -y openssh-server

RUN apt-get install -y unrar-free && \
    apt-get install -y p7zip-full

RUN mkdir /var/run/sshd
RUN echo 'root:Ubuntu@41' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

ENV CONDA_DIR /opt/conda
ENV PATH $CONDA_DIR/bin:$PATH

# writing env variables to /etc/profile as mentioned here https://docs.docker.com/engine/examples/running_ssh_service/#run-a-test_sshd-container
RUN echo "export CONDA_DIR=/opt/conda" >> /etc/profile
RUN echo "export PATH=$CONDA_DIR/bin:$PATH" >> /etc/profile

RUN mkdir -p $CONDA_DIR && \
    echo export PATH=$CONDA_DIR/bin:'$PATH' > /etc/profile.d/conda.sh && \
    apt-get update && \
    apt-get install -y wget git libhdf5-dev g++ graphviz openmpi-bin nano && \
    wget --quiet https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
    echo "c59b3dd3cad550ac7596e0d599b91e75d88826db132e4146030ef471bb434e9a *Miniconda3-4.2.12-Linux-x86_64.sh" | sha256sum -c - && \
    /bin/bash /Miniconda3-4.2.12-Linux-x86_64.sh -f -b -p $CONDA_DIR && \
    ln /usr/lib/x86_64-linux-gnu/libcudnn.so /usr/local/cuda/lib64/libcudnn.so && \
    ln /usr/lib/x86_64-linux-gnu/libcudnn.so.6 /usr/local/cuda/lib64/libcudnn.so.6 && \
    ln /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h  && \
    rm Miniconda3-4.2.12-Linux-x86_64.sh

ENV NB_USER keras
ENV NB_UID 1000

RUN echo "export NB_USER=keras" >> /etc/profile
RUN echo "export NB_UID=1000" >> /etc/profile

RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH" >> /etc/profile
RUN echo "export CPATH=/usr/include:/usr/include/x86_64-linux-gnu:/usr/local/cuda/include:$CPATH" >> /etc/profile
RUN echo "export LIBRARY_PATH=/usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LIBRARY_PATH" >> /etc/profile
RUN echo "export CUDA_HOME=/usr/local/cuda" >> /etc/profile
RUN echo "export CPLUS_INCLUDE_PATH=$CPATH" >> /etc/profile
RUN echo "export KERAS_BACKEND=tensorflow" >> /etc/profile

RUN useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \
    mkdir -p $CONDA_DIR && \ 
    chown keras $CONDA_DIR -R  

USER keras

RUN  mkdir -p /home/keras/notebook

# Python
ARG python_version=3.5

RUN conda install -y python=${python_version} && \
    pip install --upgrade pip && \
    pip install tensorflow-gpu && \
    conda install Pillow scikit-learn notebook pandas matplotlib mkl nose pyyaml six h5py && \
    conda install theano pygpu bcolz && \
    pip install keras kaggle-cli lxml opencv-python requests scipy tqdm visdom imgaug && \
    conda install pytorch torchvision cuda80 -c soumith && \
    conda clean -yt

# try alternative approach - 
RUN pip install jupyter_contrib_nbextensions && \
    pip install 'html5lib==0.9999999' && \
    jupyter contrib nbextension install --user

ENV LD_LIBRARY_PATH /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV CPATH /usr/include:/usr/include/x86_64-linux-gnu:/usr/local/cuda/include:$CPATH
ENV LIBRARY_PATH /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LIBRARY_PATH
ENV CUDA_HOME /usr/local/cuda
ENV CPLUS_INCLUDE_PATH $CPATH
ENV KERAS_BACKEND tensorflow

WORKDIR /home/keras/notebook

EXPOSE 8888 6006 22 8097

CMD jupyter notebook --port=8888 --ip=0.0.0.0 --no-browser

# DOCKERFILE END

snakers4 commented Dec 5, 2017

Hi,

Here is my keras.json

{
    "floatx": "float32",
    "backend": "tensorflow",
    "image_data_format": "channels_first",
    "epsilon": 1e-07
}

My versions are

  • tensorflow-gpu (1.3.0)
  • keras 2.0.8
  • pytorch
  • torch (0.2.0.post4)

I am running all of this inside a docker container, but I guess this should not really affect anything.
Just for completeness, my Dockerfile is below

# DOCKER FILE START 
FROM nvidia/cuda:8.0-cudnn6-devel

RUN apt-get update && apt-get install -y openssh-server

RUN apt-get install -y unrar-free && \
    apt-get install -y p7zip-full

RUN mkdir /var/run/sshd
RUN echo 'root:Ubuntu@41' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

ENV CONDA_DIR /opt/conda
ENV PATH $CONDA_DIR/bin:$PATH

# writing env variables to /etc/profile as mentioned here https://docs.docker.com/engine/examples/running_ssh_service/#run-a-test_sshd-container
RUN echo "export CONDA_DIR=/opt/conda" >> /etc/profile
RUN echo "export PATH=$CONDA_DIR/bin:$PATH" >> /etc/profile

RUN mkdir -p $CONDA_DIR && \
    echo export PATH=$CONDA_DIR/bin:'$PATH' > /etc/profile.d/conda.sh && \
    apt-get update && \
    apt-get install -y wget git libhdf5-dev g++ graphviz openmpi-bin nano && \
    wget --quiet https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
    echo "c59b3dd3cad550ac7596e0d599b91e75d88826db132e4146030ef471bb434e9a *Miniconda3-4.2.12-Linux-x86_64.sh" | sha256sum -c - && \
    /bin/bash /Miniconda3-4.2.12-Linux-x86_64.sh -f -b -p $CONDA_DIR && \
    ln /usr/lib/x86_64-linux-gnu/libcudnn.so /usr/local/cuda/lib64/libcudnn.so && \
    ln /usr/lib/x86_64-linux-gnu/libcudnn.so.6 /usr/local/cuda/lib64/libcudnn.so.6 && \
    ln /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h  && \
    rm Miniconda3-4.2.12-Linux-x86_64.sh

ENV NB_USER keras
ENV NB_UID 1000

RUN echo "export NB_USER=keras" >> /etc/profile
RUN echo "export NB_UID=1000" >> /etc/profile

RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH" >> /etc/profile
RUN echo "export CPATH=/usr/include:/usr/include/x86_64-linux-gnu:/usr/local/cuda/include:$CPATH" >> /etc/profile
RUN echo "export LIBRARY_PATH=/usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LIBRARY_PATH" >> /etc/profile
RUN echo "export CUDA_HOME=/usr/local/cuda" >> /etc/profile
RUN echo "export CPLUS_INCLUDE_PATH=$CPATH" >> /etc/profile
RUN echo "export KERAS_BACKEND=tensorflow" >> /etc/profile

RUN useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \
    mkdir -p $CONDA_DIR && \ 
    chown keras $CONDA_DIR -R  

USER keras

RUN  mkdir -p /home/keras/notebook

# Python
ARG python_version=3.5

RUN conda install -y python=${python_version} && \
    pip install --upgrade pip && \
    pip install tensorflow-gpu && \
    conda install Pillow scikit-learn notebook pandas matplotlib mkl nose pyyaml six h5py && \
    conda install theano pygpu bcolz && \
    pip install keras kaggle-cli lxml opencv-python requests scipy tqdm visdom imgaug && \
    conda install pytorch torchvision cuda80 -c soumith && \
    conda clean -yt

# try alternative approach - 
RUN pip install jupyter_contrib_nbextensions && \
    pip install 'html5lib==0.9999999' && \
    jupyter contrib nbextension install --user

ENV LD_LIBRARY_PATH /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV CPATH /usr/include:/usr/include/x86_64-linux-gnu:/usr/local/cuda/include:$CPATH
ENV LIBRARY_PATH /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:$LIBRARY_PATH
ENV CUDA_HOME /usr/local/cuda
ENV CPLUS_INCLUDE_PATH $CPATH
ENV KERAS_BACKEND tensorflow

WORKDIR /home/keras/notebook

EXPOSE 8888 6006 22 8097

CMD jupyter notebook --port=8888 --ip=0.0.0.0 --no-browser

# DOCKERFILE END
@nerox8664

This comment has been minimized.

Show comment
Hide comment
@nerox8664

nerox8664 Dec 5, 2017

Owner

tensorflow-gpu (1.3.0)
keras 2.0.8

Can you upgrade your keras and tensorflow packages?

Owner

nerox8664 commented Dec 5, 2017

tensorflow-gpu (1.3.0)
keras 2.0.8

Can you upgrade your keras and tensorflow packages?

@nerox8664

This comment has been minimized.

Show comment
Hide comment
@nerox8664

nerox8664 Dec 5, 2017

Owner

According to my experience some TF + Keras version pairs may work incorrectly.
You can check it out for example there: https://docs.floydhub.com/guides/environments/.

Owner

nerox8664 commented Dec 5, 2017

According to my experience some TF + Keras version pairs may work incorrectly.
You can check it out for example there: https://docs.floydhub.com/guides/environments/.

@snakers4

This comment has been minimized.

Show comment
Hide comment
@snakers4

snakers4 Dec 6, 2017

Many thanks for your replies and your work/advice - much appreciated.

I upgraded my packages w/o rebuilding the container using these commands (in case someone will read this):

pip3 install tensorflow-gpu==1.4
pip3 install keras==2.1.1

The versions now:
screenshot_3

I ran a number of tests

alexnet - Max error: 7.450580596923828e-09
resnet50 - Max error: 8.392333984375e-05

You advice indeed worked, many thanks.
I will ask my inception question is a separate issue.

snakers4 commented Dec 6, 2017

Many thanks for your replies and your work/advice - much appreciated.

I upgraded my packages w/o rebuilding the container using these commands (in case someone will read this):

pip3 install tensorflow-gpu==1.4
pip3 install keras==2.1.1

The versions now:
screenshot_3

I ran a number of tests

alexnet - Max error: 7.450580596923828e-09
resnet50 - Max error: 8.392333984375e-05

You advice indeed worked, many thanks.
I will ask my inception question is a separate issue.

@snakers4 snakers4 closed this Dec 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment