A React Native app built with Expo that allows users to extract text from images, PDFs, and audio using AI-powered OCR and transcription technology. The app features a queue-based job processing system with automatic polling, markdown rendering for rich text display, user authentication, camera functionality, PDF document Q&A with multiple AI models (OpenAI, Ollama, DeepSeek, Gemini), audio recording/upload, and seamless text extraction with copy-to-clipboard features.
- 🔐 Authentication: Login and Register screens with form validation
- 🔄 Automatic Token Refresh: Automatically refreshes access tokens on 401 errors and logs out if refresh fails
- 📸 Camera: Take pictures using the device camera
- 🖼️ Gallery: Select images from the photo gallery
- 📄 PDF Support: Upload and process PDF documents
- 🎤 Audio Recording: Record audio directly in the app using expo-audio
- 🎵 Audio Upload: Upload audio files from device storage
- 🔍 Text Extraction: Extract text from images using OCR API
- 🎙️ Audio Transcription: Transcribe audio recordings and files to text
- 🤖 AI-Powered PDF Q&A: Ask questions about uploaded PDFs using AI models (OpenAI, Ollama, DeepSeek, Gemini)
- 💬 Follow-up Questions: Continue conversations with PDFs using request_id
- 📋 Copy to Clipboard: Copy extracted text with a single tap
- 🔄 Extract Another: Quick workflow to extract text from multiple images/PDFs/audio files
- 🌓 Theme Toggle: Switch between Light, Dark, and System theme modes
- 💾 Theme Persistence: Theme preference saved and restored on app launch
- 🗂️ State Management: Redux with Redux Thunk for API calls
- 🧭 Navigation: React Navigation with authentication guards
- 🔒 Permissions: Proper permission handling for camera, photo library, microphone, and document access
- 🎨 Toast Notifications: Non-intrusive feedback for user actions
- ⏳ Job Queue Polling: Background job processing with automatic status polling for long-running operations
- 📝 Markdown Rendering: Rich markdown display for PDF extraction results
- Node.js (v14 or higher)
- npm or yarn
- Install dependencies:
npm installOr use Expo's install command to ensure compatible versions:
npx expo install --fix- Start the Expo development server:
npm start- iOS Simulator: Press
iin the terminal or runnpm run ios - Android Emulator: Press
ain the terminal or runnpm run android - Web Browser: Press
win the terminal or runnpm run web - Physical Device: Scan the QR code with the Expo Go app
├── App.tsx # App entry point with Redux Provider and Toast
├── app.json # Expo configuration with permissions
├── app.apk # Android build artifact (optional)
├── assets/ # Static assets (icons, splash screen)
│ ├── adaptive-icon.png
│ ├── favicon.png
│ ├── icon.png
│ └── splash.png
├── babel.config.js # Babel configuration
├── config.ts # API configuration (hard-coded base URL)
├── eas.json # EAS build configuration
├── jest.config.js # Jest configuration for Expo/React Native
├── package.json # Dependencies and scripts
├── package-lock.json # Lockfile
├── README.md
├── src/
│ ├── components/
│ │ ├── AppHeader.tsx # Reusable header component
│ │ ├── ImagePickerComponent.tsx
│ │ ├── MarkdownRenderer.tsx # Reusable markdown display component
│ │ ├── OpenaiPassModal.tsx # Modal for OpenAI pass input
│ │ └── ThemeToggle.tsx # Theme toggle component
│ ├── navigation/
│ │ └── AppNavigator.tsx
│ ├── screens/
│ │ ├── __tests__/
│ │ │ ├── HomeScreen.test.tsx
│ │ │ ├── LoginScreen.test.tsx
│ │ │ ├── PdfScreen.test.tsx
│ │ │ ├── RegisterScreen.test.tsx
│ │ │ └── SoundScreen.test.tsx
│ │ ├── HomeScreen.tsx # Image to text screen
│ │ ├── PdfScreen.tsx # PDF to text screen
│ │ ├── SoundScreen.tsx # Audio to text screen
│ │ ├── LoginScreen.tsx
│ │ └── RegisterScreen.tsx
│ ├── store/
│ │ ├── actions/
│ │ │ ├── __tests__/
│ │ │ │ └── authActions.test.ts
│ │ │ ├── helpers/
│ │ │ │ └── pollJobStatus.ts # Job polling helper for queue-based APIs
│ │ │ ├── authActions.ts
│ │ │ ├── imageActions.ts # Image extraction with job polling
│ │ │ ├── pdfActions.ts # PDF extraction and Q&A with job polling
│ │ │ ├── audioActions.ts # Audio transcription with job polling
│ │ │ └── themeActions.ts # Theme management actions
│ │ ├── reducers/
│ │ │ ├── __tests__/
│ │ │ │ └── authReducer.test.ts
│ │ │ ├── authReducer.ts
│ │ │ ├── imageReducer.ts
│ │ │ ├── pdfReducer.ts # PDF state management
│ │ │ ├── audioReducer.ts # Audio state management
│ │ │ ├── themeReducer.ts # Theme state management
│ │ │ └── index.ts
│ │ ├── types/
│ │ │ ├── authTypes.ts
│ │ │ ├── imageTypes.ts
│ │ │ ├── pdfTypes.ts # PDF-related types
│ │ │ ├── audioTypes.ts # Audio-related types
│ │ │ └── themeTypes.ts # Theme-related types
│ │ └── index.ts # Redux store configuration and typed hooks
│ ├── types/ # Additional shared types (placeholder)
│ └── utils/
│ ├── __tests__/
│ │ └── apiClient.test.ts
│ ├── apiClient.ts # API client with automatic token refresh
│ └── validation.ts
├── tsconfig.json # TypeScript configuration
- Email: Required, must be a valid email format
- Password: Required, must be at least 6 characters
- Password Visibility: Toggle to show/hide password
- Name: Required, must be at least 2 characters
- Email: Required, must be a valid email format
- Password: Required, must be at least 6 characters
- Password Visibility: Toggle to show/hide password
- PDF Upload: User uploads a PDF document
- Model Selection: User selects an AI model (OpenAI, Ollama, DeepSeek, or Gemini)
- OpenAI Pass Entry (if OpenAI selected): Modal appears for secure OpenAI pass input with visibility toggle
- Question Input: User enters a question about the PDF
- Job Queuing: System queues the job and returns a message_id
- Background Polling: App polls job status every 10 seconds until completion
- View Results: Extracted text (rendered as markdown) and description are displayed
- Follow-up Questions: User can ask additional questions using the same PDF (request_id persists)
- Fresh PDF: User can upload a new PDF to start a new session
- Audio Input: User chooses to record audio or upload an audio file
- Record: Tap "Record Audio" to start recording, tap "Stop Recording" when done
- Upload: Tap "Upload Audio" to select an audio file from device storage
- Audio Preview: Recording duration or file name is displayed
- Transcription: User clicks "Transcribe Audio"
- Job Queuing: System queues the transcription job and returns a message_id
- Background Polling: App polls job status every 10 seconds until completion
- View Results: Transcribed text is displayed with copy icon
- Copy Text: User can copy text to clipboard with toast notification
- Transcribe Another: User can transcribe another audio recording or file
- Image Selection: User takes a picture or selects from gallery
- Text Extraction: User clicks "Extract Text from Picture"
- Job Queuing: System queues the extraction job and returns a message_id
- Background Polling: App polls job status every 10 seconds until completion
- View Results: Extracted text is displayed with copy icon
- Copy Text: User can copy text to clipboard with toast notification
- Extract Another: User can extract text from another image
The app uses Redux with Redux Thunk for:
- Authentication state: user, tokens (accessToken, refreshToken), isAuthenticated, loading, error
- Image extraction state: extractedText, extracting, error
- PDF extraction state: extractedText, description, requestId, extracting, error
- Audio transcription state: transcribedText, transcribing, error
- Theme state: mode (light/dark/system), persisted with AsyncStorage
- API call management: All API calls handled through thunk actions with automatic token refresh
- Job Queue Polling: Automatic polling every 10 seconds for long-running jobs (PDF, audio, image extraction)
- Token refresh: Automatic token refresh on 401 errors, logout on refresh failure
- Navigation guards: Automatically redirects based on auth state
- Unauthenticated: Shows Login and Register screens
- After Login: Automatically navigates to Home screen
- After Register: Shows success message and navigates to Login screen
- After Logout: Returns to Login screen
The app requests the following permissions:
- iOS: Camera, Photo Library, and Microphone access
- Android: Camera, Read/Write External Storage, and Record Audio
These permissions are configured in app.json and will be requested at runtime when needed.
expo(~54.0.0) - Expo SDKreact(19.1.0) - React libraryreact-native(0.81.5) - React Native framework
@react-navigation/native- Navigation library@react-navigation/native-stack- Stack navigatorreact-native-screens- Native screen componentsreact-native-safe-area-context- Safe area handling
@reduxjs/toolkit- Redux Toolkitreact-redux- React bindings for Reduxredux- State managementredux-thunk- Async action middleware
expo-image-picker- Camera and gallery accessexpo-document-picker- PDF and document file selectionexpo-audio- Audio recording and playbackexpo-clipboard- Clipboard functionalityreact-native-toast-message- Toast notificationsexpo-status-bar- Status bar component@react-native-async-storage/async-storage- Persistent storage for theme preferencesreact-native-paper- Material Design 3 components with theme supportreact-native-markdown-display- Markdown rendering for extracted text
typescript- TypeScript compiler@types/react- React TypeScript types@babel/core- Babel compilerjest&jest-expo- Testing framework for React Native/Expo@testing-library/react-native- Testing utilities for React Native@testing-library/jest-native- Extended Jest matchers for React Nativeredux-mock-store- Mock store for testing Redux thunks@types/redux-mock-store- TypeScript types for redux-mock-store
Unit tests cover screens, reducers, actions, and utilities using Jest and React Testing Library for React Native.
npm run test- Screens: HomeScreen, PdfScreen, SoundScreen, LoginScreen, RegisterScreen
- Reducers: authReducer (including refresh token actions)
- Actions: authActions (including refresh token functionality)
- Utilities: apiClient (automatic token refresh on 401 errors)
Test Results: 79 tests passing across 8 test suites
Run individual suites with:
npx jest src/screens/__tests__/HomeScreen.test.tsx
npx jest src/screens/__tests__/PdfScreen.test.tsx
npx jest src/screens/__tests__/SoundScreen.test.tsx
npx jest src/screens/__tests__/LoginScreen.test.tsx
npx jest src/screens/__tests__/RegisterScreen.test.tsx
npx jest src/store/reducers/__tests__/authReducer.test.ts
npx jest src/store/actions/__tests__/authActions.test.ts
npx jest src/utils/__tests__/apiClient.test.tsThe Jest configuration is located in jest.config.js and is preconfigured for Expo SDK 54. Tests include mocks for AsyncStorage, SafeAreaContext, expo-audio, and icon libraries.
Local EAS preview build:
eas build -p android --profile preview --local --output=app.apkSet any required environment variables in your shell or CI before running the build.
- Framework: React Native with Expo SDK 54
- Language: TypeScript
- State Management: Redux + Redux Thunk
- Navigation: React Navigation v6
- UI Components: React Native Paper (Material Design 3) with theme support
- API Communication: Fetch API with FormData support
- Storage: AsyncStorage for persistent theme preferences
- Theme System: Custom light/dark themes with system preference support
The project is fully typed with TypeScript. All components, actions, and reducers are typed for better development experience and error prevention.
- Components: Reusable UI components
- AppHeader: Consistent header with title, subtitle, theme toggle, and optional logout
- ImagePickerComponent: Camera and gallery image selection
- MarkdownRenderer: Themed markdown display component for rich text rendering
- OpenaiPassModal: Secure modal for OpenAI pass input with visibility toggle
- ThemeToggle: Theme mode selector (light/dark/system)
- Screens: Full-screen components for navigation (Home, PDF, Sound, Login, Register)
- Store: Redux store with actions, reducers, and types
- Actions: Async thunk actions for API calls with automatic token refresh
- Reducers: State reducers for auth (including refresh token), image, PDF, audio, and theme
- Types: TypeScript interfaces and types for type safety
- Utils: Utility functions
- apiClient: API call wrapper with automatic 401 handling and token refresh
- validation: Form validation helpers
The app supports three theme modes:
- Light Mode: Custom light theme with optimized colors
- Dark Mode: Custom dark theme with optimized colors
- System Mode: Automatically follows device theme preference
Theme preference is persisted using AsyncStorage and restored on app launch. Users can toggle themes using the ThemeToggle component available in the header of Home and PDF screens.
- Initial Upload: Upload PDF and ask first question
- Follow-up Questions: Continue asking questions using the
request_idfrom previous responses - Session Management: PDF session persists until user uploads a fresh PDF or clears the session
- Model Selection: Choose between OpenAI, Ollama, DeepSeek, and Gemini models for processing
- OpenAI Pass Security: Secure modal for entering OpenAI pass with password visibility toggle
- Response Display: Shows both extracted content (with markdown rendering) and description from API responses
- Markdown Support: Extracted text is rendered with full markdown support (headings, lists, code blocks, tables, etc.)
The app implements a queue-based system for long-running operations:
- Queue Submission: API calls return immediately with a
message_id - Status Polling: App polls
/job/{message_id}every 10 seconds - Pending State: Jobs with
status: "pending"continue polling - Completion: Jobs return
content,description, andrequest_idwhen complete - Shared Helper:
pollJobStatushelper function used across image, PDF, and audio actions
- Token Management: Access tokens and refresh tokens stored in Redux state
- Automatic Token Refresh: API client automatically refreshes tokens on 401 errors
- Session Expiration: User is automatically logged out if token refresh fails
- Secure API Calls: All API calls include authorization headers when authenticated
- OpenAI Pass Handling: OpenAI pass is securely sent with requests when OpenAI model is selected