Skip to content
We create and test various machine learning models to conclude upon the best way to find the pH of a biochemical solution, given an image of a pH test strip that has been applied to that solution. We also critique existing literature on this topic.
Jupyter Notebook TeX Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


In this project, we apply several machine learning techniques and models for the purpose of classifying and estimating the pH of solutions given raw image data of pH test strips. We replicate and critique much of the work performed by Mutlu et. al, who used LS-SVM and claimed to achieve 100% classification. We believe that this high accuracy was the result of duplicating physical pH strips samples between training and validation datasets; this is problematic because the noise between different strips of the same class was not accounted for, and samples were essentially duplicated as pre-processing mitigated orientation variability efforts. We further find that regression is a more suitable approach for this domain, as pH values are on a continuous, logarithmic scale, and decimal differences in value can have significant biological consequences. In this spirit, we find that mean squared regression errors as low as ~0.033 are achievable.

File Descriptions

code contains all the code used in this project. More specific descriptions available inside that directory.

data_matrices contain the file image data as RGB values of each of the four color patches. Further pre-processing was selectively applied within the code itself.

masked_data contain the file image data with all but the four color patches masked away (the intermediate step between raw_data and data_matrices).

raw_data contain the orginial file image data.

paper.pdf is the report we wrote that documents our results and makes comparisons to existing literature.


Aman Dhar, Rudra Mehta, Matthew Sit.

All three of us are undergraduates in CS 189, completing this project for extra credit for Fall 2017. Rudra and Matthew submitted a more biological view on this topic for their final project for BioE 134, Genetic Design Automation. Some background research, data collection, and fundamental modeling approaches may overlap, but the written deliverables and area of analyses will be unique. Permission to submit a related project to 189 has been provided by the BioE 134 professor, John Christopher Anderson (jcanderson [at], as well as CS 189 professor, Anant Sahai (sahai [at]


  1. J. Anderson, 2017 10 09-Final Project, UC Berkeley, 2017.
  2. A. Mutlu, V. Kl, G. zdemir, A. Bayram, N. Horzum and M. Solmaz, Smartphone-based colorimetric detection via machine learning, The Analyst, vol. 142, no. 13, pp. 2434-2441, 2017.
  3. S. Kim, Y. Koo and Y. Yun, A Smartphone-Based Automatic Measurement Method for Colorimetric pH Detection Using a Color Adaptation Algorithm, Sensors, vol. 17, no. 7, p. 1604, 2017.
  4. H. Farid, Blind inverse gamma correction, IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1428-1433, 2001.
  5. American Type Culture Collection, C2C12 (ATCC®CRL-177TM), Product Sheet, Manassas, VA.
You can’t perform that action at this time.